1/ REASSEMBLY OF A TRANSMISSION CONTROL PROTOCOL (TCP) DATA 
TREAM FROM PAYLOADS OF TCP SEGMENTS OF A BIDIRECTIONAL TCP 
CONNECTION 



Field of the Invention 

The present invention relates to computer networks and, more particularly, to a gen e ral 
purpos e programmable platform for acc e l e ration of n e twork infrastructur e applications 
reassembly of a Transmission Control Protocol (TCP) data stream from pavloads of TCP 
segments of a bidirectional TCP connection . 

Related Applications 

The subject patent application is a continuation of, and claims priority to, U.S. Patent 
Application 10/059,770, entitled "Cumulative Status of Arithmetic Operations", filed January 28, 
2002, now U.S. Pat. No. 6,701,338; which is, in turn, a continuation of, and claims priority to, 
U.S. Patent Application Serial No. 09/283,662, entitled "Programmable System for Processing a 
Partitioned Network Infrastructure", filed 04/01/1999, now U.S. Pat. No. 6,421,730; which is, in 
turn, a continuation of, and claims priority to, U.S. Patent Application Serial No. 09/097,858, 
entitled "Packet Processing System including a Policy Engine having a Classification Unit" filed 
06/15/1998, now U.S. Pat. No. 6,157,955. 

The subject patent application is related to U.S. Patent Application 10/100,746, entitled 
"Multiple Consumer - Multiple Producer Rings", filed 3/1 8/2002, now U.S. Pat. No. 6,625,689; 
and U.S Patent Application 10/084,815, entitled "Programmable System for Processing a 
Partitioned Network Infrastructure" filed 02/27/2002, now U.S. Pat. No. 6,859,841 . The patent 
application is also related to U.S. Patent Application 09/282,790, entitled "Platform Permitting 
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Execution of Multiple Network Infrastructure Applications", filed 3/31/1999, and issued as U.S. 
Pat. No. 6,401,1 17. The subject patent application is also related to a patent application, filed on 
the same day as the present application, naming the same inventor, entitled "Language for 
Handling Network Packets", U.S. Patent Application 1 0/748,3 1 1 . 



Background of the Invention 

Computer networks have become a key part of the corporate infrastructure. Organizations 
have become increasingly dependent on intranets and the Internet and are demanding much 
greater levels of performance from their network infrastructure. The network infrastructure is 
being viewed: (1) as a competitive advantage; (2) as mission critical; (3) as a cost center. The 
infrastructure itself is transitioning from lOMb/s (megabits per second) capability to lOOMb/s 
capability. Soon, infrastructure capable of lGb/s (gigabits per second) will start appearing on 
server connections, trunks and backbones. As more and more computing equipment gets 
deployed, the number of nodes within an organization has also grown. There has been a 
doubling of users, and a ten-fold increase in the amount of traffic every year. 

Network infrastructure applications monitor, manage and manipulate network traffic in 
the fabric of computer networks. The high demand for network bandwidth and connectivity has 
led to tremendous complexity and performance requirements for this class of application. 
Traditional methods of dealing with these problems are no longer adequate. 

Several sophisticated software applications that provide solutions to the problems 
encountered by the network manager have emerged. The main areas for such applications are 
Security, Quality of Service (QoS)/Class of Service (CoS) and Network Management. Examples 
are: Firewalls; Intrusion Detection; Encryption; Virtual Private Networks (VPN); enabling 
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services for ISPs (load balancing and such); Accounting; Web billing; Bandwidth Optimization; 
Service Level Management; Commerce; Application Level Management; Active Network 
Management 

There are three conventional ways in which these applications are deployed: 

( 1 ) On general purpose computers. 

(2) Using single function boxes. 

(3) On switches and routers. 

It is instructive to examine the issues related to each of these deployment techniques. 
1. General Purpose Computers. 

General Purpose computers, such as PCs running NT/Windows or workstations running 
Solaris/HP-UX, etc. are a common method for deploying network infrastructure applications. 
The typical configuration consists of two or more network interfaces each providing a 
connection to a network segment. The application runs on the main processor (Pentium/SPARC 
etc.) and communicates with the Network Interface Controller (NIC) card either through 
(typically) the socket interface or (in some cases) a specialized driver "shim" in the operating 
system (OS). The "shim" approach allows access to "raw" packets, which is necessary for many 
of the packet oriented applications. Applications that are end-point oriented, such as proxies can 
interface to the top of the IP (Internet Protocol) or other protocol stack. 

The advantages of running the application on a general purpose computer include: a full 
development environment; all the OS services (IPC, file system, memory management, threads, 
I/O etc); low cost due to ubiquity of the platform; stability of the APIs; and assurance that 
performance will increase with each new generation of the general purpose computer 
technology. 
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There are, however, many disadvantages of running the application on a general purpose 
computer. First, the I/O subsystem on a general purpose computer is optimized to provide a 
standard connection to a variety of peripherals at reasonable cost and, hence, reasonable 
performance. 32b/33MHz PCI ("Peripheral Connection Interface", the dominant I/O connection 
on common general purpose platforms today) has an effective bandwidth in the 50-75MB/s 
range. While this is adequate for a few interfaces to high performance networks, it does not 
scale. Also, there is significant latency involved in accesses to the card. Therefore, any kind of 
non-pipelined activity results in a significant performance impact. 

Another disadvantage is that general purpose computers do not typically have good 
interrupt response time and context switch characteristics (as opposed to real-time operating 
systems used in many embedded applications). While this is not a problem for most computing 
environments, it is far from ideal for a network infrastructure application. Network 
infrastructure applications have to deal with network traffic operating at increasingly higher 
speeds and less time between packets. Small interrupt response times and small context switch 
times are very necessary. 

Another disadvantage is that general purpose platforms do not have any specialized 
hardware that assist with network infrastructure applications. With rare exception, none of the 
instruction sets for general purpose computers are optimized for network infrastructure 
applications. 

Another disadvantage is that, on a general purpose computer, typical network 
applications are built on top of the TCP/IP stack. This severely limits the packet processing 
capability of the application. 
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Another disadvantage is that packets need to be pulled into the processor cache for 
processing. Cache fills and write backs become a severe bottleneck for high bandwidth networks. 

Finally, general purpose platforms use general purpose operating systems (OS's). These 
operating systems are generally not known for having quick reboots on power-cycle or other 
wiring-closet appliance oriented characteristics important for network infrastructure applications. 

2. Fixed-function Appliances. 

There are a couple of different ways to build single function appliances. The first way is 
to take a single board computer, add in a couple of NIC cards, and run an executive program on 
the main processor. This approach avoids some of the problems that a general purpose OS 
brings, but the performance is still limited to that of the base platform architecture (as described 
above). 

A way to enhance the performance is to build special purpose hardware that performs 
functions required by the specific application very well. Therefore, from a performance 
standpoint, this can be a very good approach. 

There are, however, a couple of key issues with special function appliances. For example, 
they are not expandable by their very nature. If the network manager needs a new application, 
he/she will need to procure a new appliance. Contrast this with loading a new application on a 
desktop PC. In the case of a PC, a new appliance is not needed with every new application. 

Finally, if the solution is not completely custom, it is unlikely that the solution is scalable. 
Using a PC or other single board computer as the packet processor for each location at which 
that application is installed is not cost-effective. 
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3* Switches and Routers. 

Another approach is to deploy a scaled down version of an application on switches and 
routers which comprise the fabric of the network. The advantages of this approach are that: 
(1) no additional equipment is required for the deployment of the application; and (2) all of the 
segments in a network are visible at the switches. 

There are a number of problems with this approach. 

One disadvantage is that the processing power available at a switch or router is limited. 
Typically, this processing power is dedicated to the primary business of the switch/router - 
switching or routing. When significant applications have to be run on these switches or routers, 
their performance drops. 

Another disadvantage is that not all nodes in a network need to be managed in the same 
way. Putting significant processing power on all the ports of a switch or router is not cost- 
effective. 

Another disadvantage is that, even if processing power became so cheap as to be 
deployed freely at every port of a switch or router, a switch or router is optimized to move 
frames/packets from port to port. It is not optimized to process packets, for applications. 

Another disadvantage is that a typical switch or router does not provide the facilities that 
are necessary for the creation and deployment of sophisticated network infrastructure 
applications. The services required can be quite extensive and porting an application to run on a 
switch or router can be very difficult. 

Finally, replacing existing network switching equipment with new versions that support 
new applications can be difficult. It is much more effective to "add applications" to the network 
where needed. 
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What is needed is an optimized platform for the deployment of sophisticated software 
applications in a network environment. 

Summary 

Described herein are techniques to perform reassembly of a Transmission Control 
Protocol (TCP) data stream from payloads of TCP segments of a bidirectional TCP connection 
between a first TCP end-point operating at a first network device and a second TCP end-point 
operating at a second network device. 

The pres e nt inv e ntion relates to a gen e ral purpos e programmabl e pack e t proc e ssing 
platform for acc e l e rating n e twork infrastructur e application s which hav e been structured so as to 
s e parate th e stag e s of classification and action. A wid e vari e ty of e mbodim e nts of th e pres e nt 
inv e ntion are possibl e and will b e understood by those skill e d in th e art bas e d on th e pr e s e nt 
pat e nt application. In c e rtain embodiments, acc e l e ration is achi e v e d by one or more of th e 
following: 

• — Dividing th e st e ps of pack e t proc e ssing into a multiplicity of pip e lin e stag e s and 
providing diff e r e nt functional units for diff e rent stag e s, thus allowing mor e 
proce s sing tim e p e r pack e t and also providing concurr e ncy in th e processing of 
multipl e pack e ts, 

• — Providing custom, sp e cializ e d Classification Engines which are micro programm e d 
proc e ssors optimiz e d for th e various functions common in pr e dicat e analysis and 
tabl e s e arch e s for th e s e sort of applications, and are each us e d as pip e lin e stag e s in 
differ e nt flows, 
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♦ — Providing a g e n e ral purpos e microprocessor for ex e cuting th e arbitrary actions 

d e sir e d by thes e applications, 
• — Providing a tightly coupl e d e ncryption coprocessor to acc e l e rat e common n e twork 

encryption functions, 

♦ — R e ducing or eliminating th e ne e d for th e applications to e xamin e th e actual cont e nts 
of th e packet, thus minimizing the mov e m e nt of pack e t data and th e e ff e cts of that 
data mov e m e nt on th e proc e ssor's cach e /bus/m e mory subsyst e m, and 

# — Eith e r e liminating or providing sp e cial hardwar e to acc e l e rat e syst e m ov e rh e ads 
common to e mbedd e d n e twork applications run on g e n e ral purpos e platforms; this 
includ e s sp e cial support for managing buff e r pools, for communication among units 
and the passing of buff e rs b e tw ee n them, and for managing th e network interfac e 
MACs (media acc e ss controll e rs) without th e ne e d for heavyweight d e vic e driv e r 
programs. 

• — R e cognizing a common policy e nforc e ment modul e for n e twork infrastructur e 
applications 

C e rtain sp e cific e mbodim e nts ar e implem e nted with on e or mor e of th e following 
f e atur e s: 

* — a policy enforc e m e nt module consisting of Classification and associat e d Action 

• — both stateless classification and stat e ful classification which us e s s e ts 

♦ — Provision of a high lev e l int e rfac e to pack e t lev e l Classification and Action (Action 

and Classification Engine ACE) 
• — Provision of th e high l e v e l int e rfac e within common op e rating e nvironm e nts 
• — Policy can bo chang e d dynamically 

SFRLIB1\JKS\5091501.05 Page 8 



♦ — Application partition e d into an AP modul e running on th e AP (Application Proc e ssor) 

and a PE (Policy Engine) module running on the PE 
• — AP can run op e rating syst e ms with full s e rvic e s to facilitat e application dev e lopment 
• — PE functionality e mbodi e d as softwar e running on AP as well as hardwar e and 

softwar e running on th e hardwar e PE 
« — A languag e int e rfac e to d e scribe Classification and to associat e Actions with th e 

results of the Classification 
• — Languag e (NetBoost Classification Language NCL) for Classification/Action 

• — Object ori e nted ( e xtensible) 

* — Specific to Classification and h e nce very simpl e 

• — Built in intrinsics such as ch e ck s um 

• — Languag e constructs make it easy to d e scrib e lay e red protocols and protocol 
fields 

• — Rul e construct to associat e Classification and Actions 

« — Pr e dicat e construct which is a function of pack e t cont e nts at any lay e r of any 

protocol and/or of hash search r e sults 
• — Set construct to describe hash tables and multipl e search e s on th e same hash table 
• — Action code 

• — Writt e n in high l e v e l languag e 

• — Complex packet proc e ssing possible 

♦ — Can avail of Application S e rvic e s Library (ASL) providing servic e s useful for 
pack e t proc e ssing 
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♦ — ASL consists of pack e t manag e ment, memory manag e m e nt, tim e and e v e nt 
manag e m e nt, link l e v e l s e rvic e s, pack e t tim e stamp s e nde e , cryptographic 
servic e s, communication s e rvic e s to AP modul e plus e xtensions 

« — TCP/IP ext e nsions includ e s e rvices such as Network Addr e ss Translation (NAT) 
for IP, TCP and UDP, Ch e cksums, IP fragm e nt r e assembly and TCP s e gm e nt 
r e ass e mbly 
♦ — Syst e m components includ e 

* — library impl e menting API (DLL und e r Windows NT) 

• — a management process call e d Resolver 

• — an incr e m e ntal compiler for NCL 

• — link e r for NCL cod e 

• — dynamic link e r for action cod e 

• — op e rating syst e m sp e cific drivers which communicat e with both hardwar e and 
softwar e PEs 

* — software Policy Engin e that e xecutes Classification and Action code 
♦ — ASL for Action cod e 

♦ — manag e m e nt s e rvic e s (R e solver and Plumber) for both application d e veloper and 
th e e nd us e r 

« — d e v e lopment e nvironm e nt for AP and PE cod e including compil e rs, and oth e r 
software d e v e lopm e nt tools familiar to thos e skill e d in th e art 
• ACE 

• — C i I object which abstracts th e pack e t proc e ssing associat e d with an application 
or sub application 
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• — Provid e s a cont e xt for Classification and Action 

« — Contains on e or more Target obj e cts, including drop and default, which r e pres e nt 
packet d e stinations 

• — Provid e s a cont e xt for upcalls and downcalls b e tw ee n th e AP and th e PE modul e s 
• — Targets of an ACE ar e conn e ct e d to oth e r ACEs or int e rfaces using th e Plumb e r 
(graphical and programmatic interfac e s) to sp e cify th e s e rialization of ACE 
proc e ssing 
♦ — Op e rating e nvironment for action code 

# — Invok e s actions automatically wh e n associated classification succe e ds 

• — Impl e m e nts an ACE cont e xt 

• — Low ov e rhead (soft real tim e ) e nvironm e nt 

♦ — Handles communication between AP and PE 

« — P e rforms dynamic linking of action cod e when ACEs ar e load e d with n e w 
Classification code 
♦ — Resolv e r 

• — Maintains namespac e of applications, interfac e s and ACEs 
♦ — Maps ACEs to PEs automatically 

• — Contains th e compil e r for NCL and do e s dynamic compilation of NCL 

• — Provides th e int e rfac e s for manag e m e nt of applications, ACEs and int e rfac e s 
• — Compil e r for NCL 

• — Gen e rat e s cod e for multipl e proc e ssors (AP and PE) 

* — Allows incr e m e ntal compilation of rul e s 
• — Plumber 
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Allows int e rconnection of ACEs 
Allow binding to int e rfac e s 
Supports s e cure r e mot e acc e ss 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a system in accordance with the present invention. 
FIG. 2 is a block diagram showing packet flow according to an embodiment of the 
present invention. 

FIG. 3 is a Policy Engine ASIC block diagram according to the present invention. 

FIG. 4 is a sample system-level block diagram related to the present invention. 

FIG. 5 shows a ring array in m e mory r e lat e d to th e pres e nt inv e ntion. 

FIG. 6 shows an RX Ring Structur e r e lated to th e pr e s e nt inv e ntion. 

FIG. 7 shows a r e c e ive buff e r format relat e d to th e pr e s e nt invention. 

FIG. 8 shows a TX Ring Structure r e lated to th e pres e nt invention. 

FIG. 9 shows a transmit buff e r format r e lat e d to th e pres e nt inv e ntion. 

FIG. 10 shows a r e classify ring structure r e lated to th e pr e sent inv e ntion. 

FIG. 1 1 shows a Crypto Ring and COM[4:0] Rings Structur e r e lat e d to th e pr e s e nt 

inv e ntion. 

FIG. 12 shows a DMA Ring Structure r e lated to the present inv e ntion. 

FIG. 13 is a classification e ngin e block diagram related to th e pr e s e nt inv e ntion. 

FIG. 1 4 is a pip e lin e timing diagram for th e classification e ngin e relat e d to th e pres e nt 

invention. 
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FIG. 44 5 is an application structure diagram related to the present invention. 
FIG. 44 6 is a diagram showing an Action Classification Engine (ACE) related to the 
present invention. 

FIG. +7 7 shows a cascade of ACEs related to the present invention. 

FIG. ±£ 8 shows a system architecture related to the present invention. 

FIG. -t£ 9 shows an application deploying six ACEs related to the present invention. 

Detailed Description of Preferred Embodiments 

N e twork infrastructur e applications g e n e rally contain both tim e critical and non tim e 
critical s e ctions. Th e non tim e critical sections gen e rally deal with setup, configuration, user 
int e rfac e and policy managem e nt. The tim e critical sections g e n e rally deal with policy 
e nforc e ment. Th e policy enforc e m e nt pi e c e g e n e rally has to run at network spe e ds. Th e present 
invention p e rtains to an e fficient archit e ctur e for policy e nforcement that e nabl e s application of 
compl e x policy at n e twork rates. 

Figure 1 shows a N e twork Infrastructur e Application, called Application 2, b e ing 
d e ploy e d on an Application Proc e ssor (AP) 1 running a standard op e rating syst e m. Th e policy 
e nforc e ment s e ction of th e Application 2, called Wir e Spe e d Policy 3 runs on th e Policy Engin e 
(PE) 6. Th e Policy Engin e 6 transforms the inbound Packet Stream 8 into th e outbound Pack e t 
Str e am 10 p e r th e Wir e Sp ee d Policy 3. Communications from th e Application Processor 1 to 
the Policy Engin e 6, in addition to th e Wir e Speed Policy 3, consists of control, policy 
modifications and pack e t data as desir e d the Application 2. Communication from th e Policy 
Engin e 3 to the Application Proc e ssor A consists of status, e xception conditions and pack e t data 
as d e sir e d by th e Application 2. 
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In a pr e f e rr e d e mbodim e nt of a Policy Engin e (PE) according to the pr e s e nt invention, 
the PE provides a highly programmabl e platform for classifying n e twork pack e ts and 
implementing policy decisions about thos e pack e ts at wire sp ee d. C e rtain e mbodiments provide 
two Fast Eth e rn e t ports and impl e ment a pip e lin e d dataflow archit e ctur e with stor e and forward. 
Packets ar e run through a Classification Engin e (CE) which e x e cut e s a programmed s e ri e s of 
hardwar e assist operations such as chain e d fi e ld comparisons and gen e ration of checksums and 
hash table point e rs, then are handed to a microprocessor ("Policy Proc e ssor" or PP) for e x e cution 
of policy decisions such as Pass, Drop, Enqueu e /D e lay, (d e / e n)capsulat e , and (de/ e n)crypt based 
on th e results from th e CE. Some pack e ts which requir e higher level proc e ssing may b e s e nt to 
th e host computer syst e m ("Application Proc e ssor" or AP). (S ee Figure A.) An optional 
cryptographic ("Crypto") Proc e ssor is provid e d for accelerating such functions as encryption and 
k e y manag e m e nt. 

Third party applications such as fir e walls, rate shaping, QoS/CoS, n e twork manag e m e nt 
and oth e rs can b e impl e m e nted to tak e advantag e of this thr ee tier e d approach to filt e ring 
pack e ts. Support for easy e ncapsulation without copies combin e d with encryption support 
allows for VPNs ("Virtual Privat e N e tworks") and oth e r applications that requir e s e curity 
sendees. 

A large parity protect e d synchronous DRAM (SDRAM) buff e r memory is provid e d, 
along with a PCI int e rface that is us e d for communication with th e host (AP) and pot e ntially for 
pe e r to peer communication among Policy Engin e s, e .g. for applications which rout e and switch. 

In c e rtain e mbodim e nts th e Policy Engine ASIC can be used on a PCI card both for 
application software developm e nt and for us e in a PC or workstation as a two int e rface product, 
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and can also b e us e d in a multipl e s e gm e nt applianc e with a plurality of PE's along with an 
embedd e d Application Proc e ssor for a stand alone product. 

In certain e mbodim e nts, when us e d in an applianc e , th e PE's resid e on PCI segments 
connect e d tog e th e r through a plurality of PCI to PCI bridg e s which conn e ct to the host PCI bus 
on th e Application Proc e ssor. Th e PCI bus is 61 bit for all ag e nts in ord e r to provide suffici e nt 
bandwidth for applications which rout e or switch. 

A sample syst e m lev e l block diagram is shown in FIG. 1. 

Figur e 1 shows an application proc e ssor 302 which contains a host interfac e 301 to a PCI 
bus 32 4 . Fanout of th e PCI bus 321 to a larg e r numb e r of loads is accomplish e d with PCI to PCI 
Bridg e devic e s 306, 308, 310, and 312; each of thos e controls an isolat e d s e gm e nt on a "child" 
PCI bus 326, 328, 330, and 332 r e spectiv e ly. On three of th e s e isolated segments 326, 328, and 
330 is a number of Policy Engin e s 322; each Policy Engin e 322 conn e cts to two Ethern e t ports 
320 which conn e cts th e Policy Engin e 322 to a n e twork s e gment. 

One of th e PCI to PCI Bridges 312 controls child PCI bus 322, which provid e s th e 
Application Proc e ssor 302 with conn e ction to standard I/O d e vic e s 311 and optionally to PCI 
e xpansion slots 316 into which additional PCI d e vices can b e conn e ct e d. 

In a small e r configuration of th e pr e f e rred embodim e nt of th e inv e ntion th e numb e r of 
Policy Engin e s 322 do e s not e xc e ed the maximum load allowed on a PCI bus 321; in that cas e 
th e PCI to PCI bridg e s 306, 308, and 310 are e liminated and up to four Policy Engin e s 322 ar e 
connected dir e ctly to th e host PCI bus 321, e ach conn e cting also to two Eth e rnet ports 320. This 
smaller configuration may still hav e th e PCI to PCI Bridg e 312 pres e nt to isolat e Local 1/0 311 
and e xpansion slots 316 from th e PCI bus 321, or th e Bridg e 312 may also b e eliminat e d and th e 
d e vices 311 and e xpansion 316 may also b e conn e ct e d directly to th e host PCI bus 321. 
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I. Pack e t Flow 

In certain embodim e nts, th e PE utiliz e s two Fast Ethern e t MAC's (Media Acc e ss 
Controllers) with IEEE 802.3 standard Media Independent Interface ("Mil") conn e ctions to 
ext e rnal physical media (PHY) d e vices which attach to Eth e rnet segments. Each Eth e rn e t MAC 
r e c e iv e s packets into buffers addressed by buff e r point e rs obtain e d from a produc e r consumer 
ring and th e n pass e s the buff e r (that is, passes th e buff e r pointer) to a Classification Engin e for 
proc e ssing, and from ther e to th e Policy Proc e ssor. Th e "buffer pointer" is a data structur e 
comprising th e addr e ss of a buff e r and a softwar e assigned "tag" fi e ld containing oth e r 
information about that buff e r. Th e "buff e r pointer" is a fundamental unit of communication 
among th e various hardwar e and softwar e modul e s comprising a PE. From th e PP, th e r e are 
many paths th e packet can tak e , d e p e nding on what th e application(s) running on th e PP d e cide is 
th e prop e r disposition of that pack e t. It can b e transmitt e d, s e nt to Crypto, d e lay e d in m e mory, 
passed through a Classification Engine again for further proc e ssing, or copied from th e PE's 
memory ov e r the PCI bus to th e host's memory or to a p ee r d e vic e 's m e mory, using th e DMA 
e ngin e . Th e PP may also gather statistics on that pack e t into records in a hash table or in general 
m e mory. A point e r to the buff e r containing both th e pack e t and data structur e s d e scribing that 
packet is passed around among the various modules. 

The PP may choos e to drop a pack e t, to modify th e contents of th e pack e t, or to forward 
th e pack e t to th e AP or to a diff e rent n e twork s e gment ov e r th e PCI Bus ( e .g. for routing.) Th e 
AP or PP can cr e at e pack e ts of its own for transmission. A 3rd party NIC (N e twork Int e rfac e 
Card) on th e PCIbus can us e th e PE memory for r e ceiving pack e ts, and th e PP and AP can th e n 
coop e rat e to f ee d thos e pack e ts into th e classification stream, eff e ctiv e ly providing acc e l e ration 
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for pack e ts from arbitrary networks. When doing so, adjac e nt 2KB buff e rs can b e concat e nat e d 
to provide buffers of any siz e need e d for a particular protocol. 

FIG. 2 illustrates pack e t flow according to c e rtain embodim e nts of the pr e s e nt inv e ntion. 
Each box r e pr e s e nts a proc e ss which is appli e d to a packet buffer and/or th e cont e nts of a pack e t 
buff e r 620 as shown in FIG 7. Th e buff e r managem e nt process involv e s buffer allocation 102 
and th e recov e ry of retired buff e rs 118. Wh e n buff e r allocation 102 into an RX Ring 402 or 404 
occurs, th e Policy Processor 244 enqueu e s a buffer pointer into th e RX Ring 402 or 401 and thus 
allocat e s the buff e r 620 to th e r e ceiv e MAC 216 or 230, resp e ctiv e ly. Upon receiving a pack e t, 
th e RX MAC controller 220 or 228 uses the buffer pointer at the e ntry in the RX ring structur e of 
FIG 6 which is point e d to by MFILL 5 16 to id e ntify a 2KB section of m e mory 260 that it can use 
to store th e newly r e ceiv e d pack e t. This process of r e c e iving a pack e t and placing it into a buffer 
620 is repr e s e nt e d by physical r e ceiv e 104 in FIG 2. 

Th e RX MAC controller 220 or 228 incr e ments th e MFILL point e r 516 modulo ring siz e 
to signal that th e buffer 620 whose pointer is in th e RX Ring 402 or 404 has been filled with a 
n e w pack e t 610 and 612 plus rec e ive status 600 and 602. Th e Ring Translation Unit 264 detects 
a difference between MFILL 516 and MCCONS 514 and signals to the classification engine 238 
or 242, r e sp e ctiv e ly, for RX Ring 402 or 40 4 , that a newly rec e ived pack e t is r e ady for 
proc e ssing. Th e Classification Engin e 238 or 242 applies Classification 106 to that packet and 
cr e at e s a description of th e packet which is plac e d in th e packet buffer softwar e ar e a 614, then 
incr e m e nts MCCONS 514 to indicate that it has compl e t e d classification 106 of that pack e t. The 
Ring Translation Unit 264 d e t e cts a difference between MCCONS 511 and MPCONS 512 and 
signals to th e Policy Processor 24 4 that a classifi e d packet is r e ady for action proc e ssing 108. 

The Policy Processor 244 obtains th e buff e r point e r from th e ring location pointed to by 
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512 by d e quoueing that point e r from the RX Ring 4 02 or 401, and ex e cut e s application sp e cific 
action code 108 to determin e th e disposition of th e pack e t. The action cod e 108 may choos e to 
send th e packet to an Eth e rn e t Transmit MAC 21 8 or 231 by e nqu e u e ing th e buff e r point e r on a 
TX Ring 106 or 108, r e sp e ctively; th e packet may or may not have be e n modifi e d by th e action 
cod e 108 prior to this. Alt e rnativ e ly the action cod e 10 8 may choos e to s e nd th e pack e t to th e 
attached cryptographic proc e ssor (Crypto) 216 for e ncryption, d e cryption, compr e ssion, 
d e compr e ssion, security k e y managem e nt, parsing of IPSEC headers, or oth e r associated 
functions; this e ntir e bundle of functions is described by Crypto 112. Alternativ e ly th e action 
cod e 108 may choose to copy the pack e t to a PCI p e er 322 or 31 4 or 316, or to th e host memory 
330, both paths b e ing accomplish e d by the proc e ss 111 of cr e ating a DMA d e scriptor as shown 
in Tabl e 3 and th e n e nqueuing th e point e r to that descriptor into DM Ring 4 18 by writing that 
point e r to DMA_PROD 1116, which trigg e rs th e DMA Unit 210 to initiat e a transf e r. 
Alt e rnatively th e action cod e 118 can choos e to t e mporarily enqu e ue th e packet for delay 110 in 
m e mory 260 that is managed by th e action cod e 1 1 8. Finally, th e action cod e 108 can choos e to 
send a pack e t for further classification 106 on any of the Classification Engin e s 208, 212, 238, or 
2 4 2, e ith e r b e caus e the packet has b ee n modifi e d or b e caus e th e re is additional classification 
which can be run on th e pack e t which the action cod e 108 can command th e Classification 
proc e ss 106 to e x e cut e via flags in th e RX Status Word 600, through th e buff e r's software area 
611, or by use of tag bits in th e 32 bit buff e r point e r r e s e rv e d for that us e . 

Pack e ts can arrive at the classification process 106 from additional sourc e s b e sides 
physical rec e iv e 104. Classification 106 may r e ceiv e a packet from th e output of the Crypto 
processing 112, from the Application Proc e ssor 302 or from a PCI peer 322 or 3 14 or 3 16, or 
from th e application cod e 108. 
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Packets can arriv e at th e action codo 108 from classification 106, from the Application 
Processor 302, from a PCI p ee r 322 or 3 11 or 3 1 6, from the output of th e Crypto processing 112, 
and from a delay qu e ue 110. Additionally th e action cod e 108 can cr e ate a pack e t. Th e 
disposition options for these packets ar e the same as thos e d e scrib e d for th e receiv e path, above. 

Th e Crypto processing 1 12 can rec e iv e a pack e t from the Policy Proc e ssor 211 as 
describ e d abov e . Th e Application Proc e ssor 302 or a PCI p ee r 322 or 3 11 or 3 16 can also 
enqueu e th e pointer to a buffer onto the Crypto Ring 120 to sch e dul e that pack e t for Crypto 
processing 112. 

The TX MAC 218 or 23 4 transmits packets whos e buff e r pointer hav e be e n e nqu e ued on 
the TX Ring 406 or 108, r e sp e ctively. Those point e rs may hav e be e n enqu e u e d by th e action 
code 106 running on the Policy Proc e ssor 211, by th e Crypto proc e ssing 112, by th e Application 
Processor 302, or by a PCI p e er 322 or 3 1 1 or 3 1 6. When the TX MAC controller 222 or 232 
has retir e d a buffer either by succ e ssfully transmitting th e packet it contains, or abandoning th e 
transmit du e to transmit termination conditions, it will optionally write back TX status 806 and 
TX Tim e stamp 808 if programm e d to do so, th e n will incr e m e nt MTCONS 71 4 to indicat e that 
this buff e r 840 has be e n r e tir e d. Th e Ring Translation Unit 261 d e tects that there is a diff e r e nce 
b e twe e n MTCONS 711 and MTRECOV 712 and signals to th e Policy Proc e ssor 211 that the TX 
Ring 4 06 or 4 08 has at least on e retir e d buff e r to r e cover; this trigg e rs the buff e r r e covery 
process 1 1 8, which will dequ e u e th e buff e r point e r from the TX ring 4 06 or 108 and e ith e r s e nd 
th e buff e r pointer to Buff e r Allocation 102 or will add the r e cov e r e d buff e r to a softwar e 
managed fr e e list for lat e r us e by Buff e r Allocation 102. 

It is also possibl e for a d e vic e in th e PCI e xpansion slot 3 16 to play th e rol e d e fin e d for 
th e attached Crypto proc e ssor 216 p e rforming crypto processing 112 via DMA111 in this flow. 
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±-. Communication and Buff e r Manag e m e nt 

In certain e mbodim e nts, the buffer memory consists of 16 to 128 MB of parity prot e cted 
SDRAM. It is us e d for packet buffers, for cod e and data structures for the microproc e ssor, as a 
staging area for Classification Engin e microcod e loading, and for buffers us e d in communicating 
with th e AP and oth e r PCI ag e nts. The following us e s of m e mory are defined by th e archit e ctur e 
of th e Policy Engin e : 

• Buffer Point e r rings for RX_MAC_A, RX_MAC_B, TX_MAC_A, TXJV1ACJB 
(wiiorc "RX" denotes "receive", "TX" denotes "transmit", and "_A" and "_B" 
indicat e which instanc e of th e MAC is being d e scrib e d.) 

• — A pool of 2KB align e d buff e rs us e d for holding packets that ar e being proc e ssed in 
this chip as well as information about thos e pack e t s ; larg e r buff e rs can be cr e at e d by 
concatenating th e se 2KB buff e rs if n ee ded for processing larger pack e ts from other 
media. 

« — "Reclassification" point e r rings for e ach of th e four Classification Engin e s; th e se ar e 
us e d to sch e dul e pack e ts for processing on that CE, when the classification of th e 
pack e t is b e ing scheduled by an ag e nt oth e r than an RX MAC. 

• — A ring containing pointers to DMA d e scriptors us e d to sch e dule transf e rs using the 
DM e ngin e ; data copies between PCI and memory in cither dir e ction ar e sch e dul e d 
by enqueuing descriptor pointers on this ring. 

♦ — A pool of m e mory allocat e d for us e as DMA descriptors. 

• — A point e r ring for scheduling packets for proc e ssing on th e Crypto unit. 

• — An ar e a that contains instructions for the microprocessor, including the boot 
s e qu e nc e . 
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• — An ar e a for staging microcode to b e load e d into the control stor e of th e four 

Classification Engines. 
* — Page tabl e s for th e Policy Proc e ssor MMU 

# — 16 words d e dicat e d to mailbox communications; writes to thes e words from th e 
PCIbus also s e t th e corresponding mailbox bit in the mailbox status regist e r which 
signals to th e processor that th e indicat e d mailbox has a n e w messag e . 
« — A pool of 2KB buff e rs that b e long to th e AP and ar e us e d for sch e duling transmits of 
pack e ts that have b ee n hand e d to the AP for proc e ssing or that originate at the AP. 
In addition to th e s e uses, parts of th e memory may be allocated to th e applications running on th e 
PP for storing data such as local variables, counters, hash tabl e s and th e data structures th e y 
contain, AP to PP and PP to AP application l e v e l communications ar e as, e xt e rnal coprocessor 
communication and transmit buffers, etc. 

Th e Policy Engin e tak e s advantag e of th e fact that buff e rs ar e 2KB aligned, and has th e 
hardwar e ignor e th e low e r 11 bits of e ach buff e r bas e point e r, thus e nabling softwar e to use thos e 
point e r bits as tags. 

A simpl e and lightweight m e chanism for buff e r allocation and r e covery is provid e d. 
Hardwar e support for atomic enqueu e and d e qu e ue of buff e rs through producer consum e r rings, 
along with detection of compl e ted (r e tir e d) buff e rs e nables buff e r manag e m e nt in only a f e w 
instructions. In th e r e altime e x e cutiv e loop run on th e PP, a short s e ction is d e voted to 
r e claimation of fr ee buffers into the fre e list from thos e rings which indicate to th e PP that th e y 
hav e r e tir e d buff e rs available for recov e ry. Th e RX pools of allocat e d, empty buff e rs maintain e d 
in th e RX Rings can bo replenish e d from th e fr ee list e ach time a filled, classified RX buff e r is 
dequeued from that ring, thus maintaining th e pool siz e . A simpl e linked list of buff e rs or oth e r 
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m e thod w e ll known to thos e v e rs e d in th e art can be us e d to impl e m e nt a softwar e manag e d 
froolist from which to fe e d th e pools. 

In order to support atomic e nqueueing/dequ e u e ing of buff e r point e rs and of DMA 
d e scriptor pointers, a standard memory basod produc e r/consumer ring structur e is supported in 
hardwar e for many purposes (as repr e s e nt e d by th e circl e with arrow symbols in Figur e 3). In 
most cas e s on e or mor e of th e consum e rs is also a produc e r for th e next consum e r, so th e rings 
have a series of ind e x point e rs wfaich chase each other in s e quence; for e xampl e th e MAC RX 
Rings have a Produce Point e r for the allocation of e mpty buffers, a MAC FILL Pointer for th e 
RXMAC to consum e e mpty buffers and produc e full buffers, a Classification Engin e Consum e 
Point e r for th e CE to consum e freshly receiv e d buff e rs and to produc e classified buff e rs, and a 
Policy Proc e ssor Consum e Pointer for th e PP to consume classifi e d pack e ts as shown in FIG 6. 
The l e ading producer acc e ss e s the ring through an "enqueu e " r e gister, and the end consum e r 
acc e sses the ring through a "d e qu e u e " regist e r, obviating th e ne e d for mut e x e s (mutual e xclusion 
locks) or (slow) memory acc e ss e s in managing shar e d ring structur e s. Interim consum e r 
produc e rs f e tch a buff e r pointer through a ring ind e x, then incr e ment that index later to signal 
that they have finish e d processing th e r e ferenc e d buff e r and that it is availabl e for the next 
consum e r. 

This serializ e d multipl e produc e r/multipl e consum e r ring structure allows for one ring to 
support a comp e ll e d s e ri e s of steps with much l e ss hardware than would b e r e quired to support a 
s e parat e FIFO betw e en each producer and consum e r, and e liminat e s th e n ee d for e ach consumer 
produc e r to writ e pointers to the n e xt ring; ev e ry cycle saved in a real tim e syst e m such as this 
can be significant. 
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Hardwar e d e t e cts wh e n th e re is a diff e renc e b e tw ee n a producer's ring index and th e ring 
ind e x for the next consum e r in that communication s e qu e nc e , and s ignals to th e con s um e r that 
th e r e is at least on e buff e r point e r in its ring for processing; thus th e pr e sence of work to do 
wakes up th e associat e d unit, impl e menting a dataflow archit e ctur e through the use of hardware 
manag e d rings. 

Ring overflow, und e rflow, and threshold conditions are det e ct e d and reported to the ring 
users and the PP as appropriat e . 
Or. Memory and Ring Translated Memory 

2.1 M e mory 

Main memory in th e pref e rred embodim e nt consists of up to 128 MB of synchronous 
DRAM (SDRAM) in two DIMM's (Dual In line Memory Modules) or one double sided DIMM. 
D e tecting th e presence of th e DIMMs and th e ir attributes us e s th e standard Serial Pr e senc e 
Detect int e rfac e , using th e SPD regist e r to manag e acc e sses to the s e rial PROM. (Th e same 
int e rfac e is us e d to acc e ss a s e rial PROM containing MAC addresses, ASIC configuration 
param e ters, and manufacturing information.) D e p e nding on the size of DIMM's installed, 
memory might not b e contiguous; each socket is allocat e d 61 MB of addr e ss spac e , and will alias 
within that 64 MB spac e if a smaller DIMM is us e d. Alt e rnativ e ly on e 128 MB DIMM is 
support e d, in on e socket only. 
2r2 Ring Translat e d M e mory 

The point e r rings associat e d with various units ar e simply a r e gion of memory which is 
acc e ss e d through a translation unit. Th e translation unit implem e nts th e rings as a base register 
(which is used to assign an arbitrary memory location to b e us e d for th e rings) plus a set of ind e x 
r e gist e rs which e ach point to an array entry r e lativ e to th e bas e addr e ss. R e ads and writes to th e 
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address associat e d with a particular index r e gister actually access m e mory at th e ring e ntry 
pointed to by that ind e x register; that is, such access e s ar e indir e ct. Som e ind e x r e gist e rs ar e 
automatically incr e ment e d aft e r an access (for atomic enqu e ue and d e queue op e rations), issu e d 
by leading produc e rs or end consum e rs while others ar e incr e m e nted specifically by th e ir owner 
(g e n e rally an int e rim consum e r produc e r) to indicat e that the r e f e renc e d buff e r has b ee n 
proc e ss e d and is now availabl e for th e n e xt consumer down th e chain. Pairs of point e rs hav e a 
produc e r consum e r r e lationship, and a differ e nce between them indicates to th e consumer that 
ther e is work to do; that diff e r e nce is detect e d in hardwar e and is signal e d to th e appropriat e unit. 

Th e r e ar e 15 rings in the pr e ferr e d e mbodim e nt, each 4 KB in siz e (IK entries of 1 byt e s 
e ach); the 60 KB array of 15 rings resid e s on a 64 KB boundary in memory. Th e bas e of this 
array is point e d to by the Ring Base Regist e r. Th e rings themselv e s ar e not acc e ss e d dir e ctly; 
instead th e y appear to th e users as a s e t of "regist e rs" which are r e ad or writt e n to access th e 
e ntri e s in m e mory that ar e point e d to by th e associated ind e x regist e r. For addr e ssing purpos e s 
each ring is assign e d a numb e r, which is used as an ind e x both into th e array in memory and into 
th e Ring Translation Unit (RTU) r e gist e r map. 

Writ e s to a ring will caus e th e data (which is g e n e rally a buffer pointer, or in th e cas e of 
th e DMA Ring, a point e r to a DMA d e scriptor) to be stor e d at the location in m e mory point e d to 
by [(RingArray[Ring ft]) + (RTU index r e gister used)], and then that ind e x regist e r is 
incr e m e nt e d modulo ring size. Reads from a ring will r e turn th e data (buff e r point e r or 
descriptor pointer) pointed to by [(RingArray[Ring it]) + (RTU index register used)]; if that 
r e gist e r is an auto increment r e gist e r then it will increm e nt modulo ring siz e after th e read 
op e ration. A r e ad attempt e d via a consum e r ind e x register which match e s its corr e sponding 
produc e pointer (that is, th e re was no work to do) will return z e ro and th e ind e x pointer will not 



SFRLIB1UKS\5091501.05 



Page 24 



increment. Regist e rs which ar e not auto increm e nt are incr e m e nt e d explicitly by that r e gister's 
own e r when th e r e fer e nced buffer has b e en processed; th e incr e m e nt is don e via a hardwar e 
signal, not by register access. 

Ring underflow/ov e rflow and near empty/n e ar full thr e shold status (as appropriat e ) ar e 
r e port e d through th e CRISIS register to th e PP and the AP. 

II. Policy Engine 

FIG 3 shows a Policy Engin e ASIC block diagram according to c e rtain embodim e nts of 
the pr e s e nt inv e ntion. 

Th e ASIC 290 contains an int e rfac e 206 to an e xternal RISC microprocessor which is 
known as the Policy Processor 244-. Int e rnal to th e RISC Proc e ssor Int e rface 206 ar e r e gist e rs 
for all units in the ASIC 290 to signal status to the RISC Proc e ssor 211. 

There is an int e rfac e 201 to a host PCI Bus 280 which is used for mov e ment of data into 
and out of th e memory 260, and is also used for e xt e rnal acc e ss to control r e gisters throughout 
the ASIC 290. The DMA unit 2 1 0 is the Policy Engine 322's agent for master activity on th e PCI 
bu s 280. Transactions by DMA 210 are sch e dul e d through the DMA Ring 118. Th e Memory 
Controll e r 2 4 0 r e c e iv e s memory acc e ss r e qu e sts from all ag e nts in th e ASIC and translates them 
to transactions sent to th e Synchronous DRAM M e mory 260. Addr e sses issued to th e Memory 
Controll e r 2 4 0 will be translat e d by the Ring Translation Unit 264 if addr e ss bit 27 is a T, or will 
b e us e d untranslated by the m e mory controller 210 to acc e ss memory 260 if addr e ss bit 27 is a 
f 0'. Untranslat e d addr e sses ar e also e xamin e d by the Mailbox Unit 262 and if the addr e ss 
matches th e memory addr e ss of one of the mailbox e s the associated mailbox status bit is set if 
the transaction is a writ e , or cl e ar e d if th e transaction is a road. In addition to the dedicated rings 
in th e Ring Translation Unit 261 which ar e describ e d hero, the Ring Translation Unit also 
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impl e ments 5 g e n e ral purpose communications rings COM[4:0] 226 which softwar e can allocat e 
as d e sir e d. The memory controller 210 also implem e nts an interfac e to s e rial PROMs 270 for 
obtaining information about m e mory configuration, MAC address e s, board manufacturing 
information, Crypto Daught e rcard id e ntification and oth e r information. 

The ASIC contains two Fast Ethernet MACs MAC_A and MAC_B. Each contains a 
r e c e iv e MAC 216 or 230, r e spectively, with associat e d control logic and an int e rfac e to th e 
memory unit 220 or 228, r e sp e ctiv e ly; and a transmit MAC 218 or 231 r e sp e ctively with 
associated control logic and an interfac e to th e memory unit 222 or 232, r e sp e ctiv e ly. Also 
associated with e ach MAC is an RMON count e r unit 221 or 236, r e sp e ctiv e ly, which counts 
c e rtain aspects of all packets r e c e ived and transmitt e d in support of providing th e Eth e rn e t MIB 
as defin e d in Int e rn e t Engin ee ring Task Force (IETF) standard RFC 1213 and r e lat e d RFC's. 

RX A Ring 102 is used by RX MAC_A controll e r 220 to obtain empty buffers and to 
pass fill e d buff e rs to Classification Engin e 238. Similarly RX_B Ring 101 is used by RX 
MACJB controller 228 to obtain e mpty buff e rs and to pass fill e d buff e rs to Classification Engine 
2 4 2. TX^ARing 106 is used to sch e dul e packets for transmission on TX MAC_A 218, and 
TXJ3 Ring 4 08 is us e d to sch e dule pack e ts for transmission on TX MACJ3 231. 

There ar e four Classification Engines 208, 212, 238, and 212 which are 
microprogramm e d proc e ssors optimized for the pr e dicate analysis associated with pack e t 
filt e ring. Th e classification engines ar e d e scrib e d in FIG. 13. Packets are schedul e d for 
proc e ssing by th e s e e ngines through th e us e of th e R e classify Rings 1 12, 1 16, 1 10, and 111 
respectively, plus th e RX MAC controll e rs MAC_A 220 and MAC B 228 can sch e dule packets 
for processing by Classification Engin e s 238 and 242, r e sp e ctively, through us e of th e RX Rings 
402 and 101, respectiv e ly. 
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Thoro is Crypto Proc e ssor Int e rface 202 which e nabl e s attachm e nt of an encryption 
processor 216. Th e RISC Processor 211 can issu e reads and writes to th e Crypto Processor 246 
through this interfac e , and th e Crypto Processor 216 can acc e ss SDRAM 260 and control and 
status registers int e rnal to the int e rfac e 202 through use of int e rfac e 202. 

A Tim e stamp count e r 21 4 is driven by a stabl e oscillator 292 and is used by the RX MAC 
logic 220 and 228, the TX MAC logic 222 and 232, the Classification Engin e s 208, 212, 238, 
and 212, th e Crypto Proc e ssor 216, and the Policy Proc e ssor 211 to obtain tim e stamps during 
proc e ssing of pack e ts. 

Pr e f e rably, th e Policy Engin e Units have the following characteristics: 

4, PCI Interfac e 

♦ — 33 MHz operation. 

* — 32/61 bit data path. 

♦ — 32 bit addr e ssing both as a targ e t and as an initiator. 
• — Initiator and Targ e t int e rfac e . 
♦ — On e int e rrupt output. 

« — Up to 32 byt e bursts as a mast e r; up to 32 byt e bursts to memory (BAR0) as a targ e t 
(disconn e cts on 32 byte boundari e s), singl e data phase operations as a target for 
R e gist e r (BAR1) and Ring Translation Unit (BAR2) spac e s. 

♦ — Singl e configuration spac e for th e e ntire d e vic e . 

2r. RISC Proc e ssor Int e rfac e 

♦ — Int e rfac e to e xt e rnal SA 110 StrongARM proc e ssor, running the bus at ASIC cor e 

clock or half cor e clock as programm e d in th e Proc e ssor Control and Status R e gist e r. 
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• — Handl e s all transaction typ e s for PIO's (r e ads and writ e s of I/O registers), cach e 

fills/spills, and non cach e d memory acc e ss e s. 
• — Low and high priority interrupt signals, driv e n by enabl e d bits of PISR and PCSR. 
• — Boots from main m e mory; an external agent must initializ e memory, download local 

initialization code e tc, and rel e as e processor r e s e t to enabl e op e ration. 
« — Support for remap of th e trap/r e s e t vector to any location in PE Memory. 

Classification Engin e 

• — Microcod e d e ngin e for acc e l e rating comparisons and hash lookups. 

• — Runs a s e t of comparison s on fields e xtract e d from 32 bit words within a pack e t to 

offload processor. 

• — Op e rations can be on fi e lds in th e pack e t, or on pairs of r e sult bits from pr e vious 
comparisons. 

« — Produc e s a r e sult v e ctor of on e bit r e sult for e ach comparison or for each boolean 
op e ration on pairs of bits in th e v e ctor (s e lect e d bits of which are then stored in a data 
structur e in th e 2 KB pack e t buff e r). 

• — Can also execute on e or mor e hash lookups on on e or mor e tables bas e d on k e ys 
e xtracted from th e pack e t. Optimiz e d for link e d list chasing through th e us e of non 
blocking loads and speculativ e f e tch of th e next r e cord; search es of hash tabl e s 
implementing conflict resolution by chaining ar e thus acc e l e rat e d. The hash lookup 
results are also stor e d in the pack e t buffer in m e mory. 

« — Arbitrary fields can be e xtract e d from th e pack e t and return e d in th e pack e t's data 
structur e to th e PP. Arbitrary computation on e xtracted fields and r e sult v e ctor bits 
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which yield multi bit r e sults can also b e don e in th e CE, and th e r e sults return e d to 

th e PP in th e data structur e . 
♦ — The abov e computations could also incorporate op e rands found in hash tabl e r e cords 

found during the above hash search e s. 
• — Th e contents of hash tabl e r e cords found using k e ys e xtract e d from th e pack e t can b e 

updated with r e sults of computations such as thos e d e scribed abov e . 
• — Supports fast TCP/IP ch e cksum calculation via us e of th e "split add" unit. 
« — D e cisions and branch e s ar e support e d. 

♦ — Comparisons, e xtractions and computations, and hashing ar e run sp e culativ e ly b e for e 
th e pack e t is hand e d to th e Policy Proc e ssor; if th e cod e on the PP (th e Action s e ction 
of th e application) n ee ds to run rul e s against th e packet, th e comparisons ar e done and 
ready for it to us e , with singl e bit decisions ("pr e dicat e analysis r e sults") for each 
policy to apply. Similarly, if the Action cod e n ee ds to comput e or extract information 
about th e packet, th e r e sults of that computation ar e already availabl e in th e pack e t's 
data structur e . 

« — Pack e ts ar e sch e dul e d for classification from both th e RX MAC ring and a 

r e classification ring for th e "Inbound" CEs, from a reclassification ring alon e for 
"Outbound" CEs. 

4 Eth e rnet MACs 

# — Standard 10/100 Mbit IEEE 802.3u compliant MAC with Mil int e rface to ext e rnal 
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• — Each RX MAC has support for a singl e unicast addr e ss match, multicast hash filt e r, 

broadcast packets, and promiscuous mod e . 
♦ — Serial Mil manag e m e nt int e rfac e to PHY 

• — RX MAC inserts pack e ts along with r e ceiv e status into 2 KB align e d buff e rs, with th e 
pack e t aligned so that th e IP head e r is on a 32 bit boundary; k e eping th e r e ceiv e 
buff e r ring repl e nish e d with e mpty buffers is th e only proc e ssor interaction with th e 
MAC (i. e . th e re is no run tim e d e vic e driver ne e d e d for the MAC). 

• — Transmit MAC follows a ring of buffer point e rs; sch e duling of transmit buffers from 
any sourc e is support e d through a r e gister which mak e s enqueuing atomic, thus 
allowing multipl e masters to schedul e transmits without mut e x e s. 

« — Modo bit for PASS or DROP of bad ethornet packets (CRC errors etc). 

* — Hardwar e counters to support RMON ETHER statistics gath e ring. 

• MACs operat e on 2.5 MHZ/25 MHz RXCLK and TXCLK from the external Fast 
Eth e rn e t PHY, each has its own clock domain and a synchronizing int e rface to th e 
ASIC core. 
#r M e mory Controller 

« — Manag e s up to two DIMMs of SDRAM. 

♦ — Aggr e ssiv e ly schedules two banks independently for high p e rformanc e . 
• — Arbitrat e s among many agents; priorities ar e : 

4-) MACA, MAC__B ping pong (top prio); internal to each MAC, th e TX and RX 

units arbitrat e locally for th e MACs memory interfac e , with ping pong priority 

2) Round robin priority among PP, CE_AI, CE_AO, CE_BI, CE_BO, DMA, 
PCI_Targ e t, Crypto 
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♦ — Supports diff e r e nt sp e ed grad e s of SDRAM, programmable timing. 

♦ — Parity generation and ch e cking. 

• — Serial Presenc e Detect (SPD) int e rfac e . 

• — Contains th e Ring Translation Unit for mapping Ring acc e sses to M e mory 
addresses. 

« — Contains th e Mailbox address matching and status unit. 
& DMA Engin e 

# — Can b e us e d by PP, Crypto, and also by th e host (Application Proc e ssor) and PCI peer 
devic e s. 

• — Moves word aligned bursts of data betw e en SDRAM and PCIbus. 

# — Data is transferr e d between memory and PCI in byte lan e ord e r, for endian n e utral 

transfers of byt e str e ams. S e e "Endianness" in S e ction 8. 
# — Each DMA is controlled by a 16 byt e d e scriptor; the initiator first constructs a 

d e scriptor, th e n enqueu e s a point e r to that descriptor on th e DMA Ring to sch e dul e 

th e transf e r. 

• — Atomic e nqu e u e ing is supported to e liminat e locks wh e n sch e duling DMAs. 

• — At completion of e ach DMA, th e unit can optionally set on e of 8 status bits in the 

PISR (Processor Interrupt Status Regist e r) or one of 8 status bits in th e HISR (Host 

Interrupt Status R e gister), as indicat e d in th e descriptor. 
• — DMA e ngin e ignor e s lower 1 1 bits of the SDRAM address, using a s e parat e "buffer 

offset" inst e ad. This is to support th e buff e r tag fi e ld in th e buff e r pointer us e d by 

softwar e . 

« — D e scriptor is defin e d in "DMA Command Qu e u e and D e scriptors" in S e ction 6. 
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# — PCI command code is carried in the descriptor for flexibility. 

7-. Crypto Control 

# — PE ASIC hosts a 32 bit PCI bus for connecting to the Crypto coproc e ssors), with two 

ext e rnal r e qu e st/grant pairs and two int e rrupt inputs. PP can dir e ctly acc e ss devic e s 

on this bus. 

• — 4 BAR's ("Base Address Registers", which ar e part of the PCI standard) ar e 
supported: BARO for M e mory, BAR1 for acc e ss to th e ring status bits, BAR2 for 
access to th e rings, and BAR3 for pr e fetch e d acce s s to M e mory. 

• — Pack e ts ar e sch e duled for e ncryption by placing a Crypto descriptor in a data 

structur e in th e pack e t buff e r in memory, th e n e nqu e u e ing th e pointer to that buff e r in 
the Crypto Ring. (Communication Ring 4 is also availabl e for similar us e with a 
s e cond coprocessor.) 

* — Th e Crypto chip will d e tect qu e ue not e mpty by polling th e CSTAT (Cryp tQ Status 
R e gist e r) r e gist e r and will dequeue the buffer pointer at th e h e ad of th e qu e u e for 
processing. Two rings ar e availabl e so that up to two d e vices can be support e d for 
this function. 

• — Aft e r processing a pack e t, th e Crypto chip will write th e r e sults back to memory and 
th e n e nqu e ue th e buff e r point e r on th e specified d e stination ring (for furth e r 
classification, for e xamination on th e PP, for DMA to a targ e t on th e PCI bus, or for 
transmit.) 

& Mailbox Unit 

• — Monitors 16 word sized mailbox e s in memory spac e . 
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♦ — On addr e ss match, s e ts(clears) th e status bit in th e Mailbox Status R e gister associat e d 

with the word written(read). Selected status bits contribut e to a Mailbox Att e ntion status 

bit in fee PISR. 

9t Ring Translation Unit 

« — Base point e r to a 64KB region of m e mory (only th e first 60 KB ar e us e d, 4 KB r e maind e r 

is availabl e for oth e r us e ). 
• — Maintains 15 rings as memory arrays of IK 32 bit e ntri e s each. 
• — R e ads and writes to rings through th e RTU are mapp e d to locations in th e se arrays. 
♦ — Some ind e x r e gist e rs auto incr e m e nt, others ar e incr e m e nt e d by their own e r. 
• — Delta b e tw e en produc e r consum e r ind e x pairs is d e t e ct e d in hardwar e . Any delta is 

signaled to th e consum e r indicating that there is work to do. 
• — 10 of the rings hav e specific assignment as shown in Figur e 3. 
• — 5 g e n e ral purpos e rings COM[1:0] ar e provid e d for softwar e to allocate as d e sir e d; 

e xpected us e includ e s a fr e elist for DMA d e scriptors and a fr e elist of buff e rs for the AP or 

p ee rs to us e , messag e s in to th e PP, and others. COM4 can optionally b e used as a 

second Crypto ring. 

« — Ov e rflow/und e rflow and thr e shold conditions ar e d e t e ct e d and report e d through th e 

CRISIS r e gist e r in th e Policy Processor int e rface. 

m Global TIMER 

• — 32 bit up count e r driv e n from an e xt e rnal, asynchronous clock sourc e . 

• — Counts at 1 uS in bit 3 (l e aving room for fin e r granularity in future high e r speed 

impl e mentations.) Counter rolls ov e r approximat e ly e v e ry 536.87 s e conds. 
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« — Status bit in PISR/HISR s e ts on e very transition (high low and low high) in bit [30] to 

simplify software e xtension of the tim e r valu e . 
« — An Ethern e t crystal (buff e r e d copy) is used as th e clock sourc e sinc e it is th e most stabl e 

tim e bas e availabl e . Runs at 25 MHz. 
« — In multi PE implementations, all PE T s r e c e iv e th e same clock source to avoid relative 

drift in tim e stamps. In systems using multiple PCI cards e ach containing a PE th e y e ach 

rec e ive a local, non align e d clock. 
♦ — Us e d by MACs, Classification Engin e s, and PP for marking ev e nts; us e d for monitoring 

p e rformanc e and packet arrival order as need e d. 
ih S e rial PROM 

• — Support for a 24C02 256 byt e serial PROM at s e rial addr e ss 0x7; th e memory DIMMs 

ar e at address e s 0x0 and 0x1 for slots 0 and 1 (if support e d). 
• — PROM at 0x7 contains two MAC addr e sses, full/half sp e ed control indication for the 

proc e ssor bus, manufacturing information, and oth e r configuration and tracking 

information. 

♦ — Additional d e vic e s on th e SPD bus includ e a Crypto Daught e rcard IDPROM at addr e ss 
0x6, and a th e rmal sensor at address 0x1 . 

III. Data Structur e s 

4t Ring Array in M e mory 

Th e 1 5 rings are pack e d into a 60 KB array aligned on a 61 KB boundary in m e mory. 
Th e RINGJBASE regist e r points to th e start of this array. Each ring is 1 KB in siz e and can hold 
up to IK entries of 32 bits e ach. 

FIG. 5 illustrates a ring array in m e mory. 
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Tho Ring Translation Unit (RTU) 261 manages 15 arrays in memory 260 for 
communication purpos e s. Each ring actually consists of 1021 32 bit e ntries in memory for a 
total of 1 KB per ring, along with ind e x r e gisters and logic for d e t e cting diff e r e nc e s betwe e n th e 
ind e x r e gister for a produc e r and th e ind e x r e gist e r for the associat e d consum e r, which is 
r e port e d to that consum e r as an indication that th e re is work for it to do. Various n e ar full 
threshold, n e ar e mpty threshold, full, and e mpty conditions ar e d e t e ct e d as appropriat e to e ach 
ring and ar e r e port e d to th e ring us e rs and to the Policy Proc e ssor 211 as appropriat e . Th e RTU 
261 translat e s Ring acc e sses into both a memory 260 acc e ss at a translat e d address, and in som e 
cas e s into commands to increm e nt sp e cific ind e x pointers aft e r compl e ting that memory acc e ss. 
Each ring is assign e d a numb e r for mapping purposes, and that number is us e d to index into th e 
array of memory 260 in which th e rings are implemented. Th e ind e x r e gist e rs ar e incr e ment e d 
modulo 1KB so that FIFO behavior is achi e v e d. Each ind e x register contains on e more 
significant bit than is used for addressing, so that a full ring can bo diff e rentiated from an e mpty 

A Ring Bas e R e gist e r 100 s e lects th e location in memory 260 of th e bas e of th e 61 KB 
aligned array 110 r e pres e nt e d in FIG 5. Th e structure is an array of arrays; ther e is an array of 
15 rings ind e x e d by th e ring number, and each of thos e rings is a 1KB array of 1021 32 bit 
e ntri e s indexed by various index r e gist e rs used by different ag e nts. 

RX_A Ring 102 and RX_B Ring 101 impl e m e nt the structure d e scrib e d in FIG 6, and ar e 
associated with th e r e c e iv e streams from RX MAC_A 220 and RX MAC_B 228 r e sp e ctively. 
TX_A Ring 4 06 and TX_B Ring 108 implement th e structur e of FIG 8, and ar e associat e d with 
th e transmit MACs 222 and 232 r e spectively. The R e classify Rings 1 10, 1 12, 1 11, and 1 16 ar e 
us e d to schedul e packets for classification on Classification Engin e s 238, 208, 212, and 212 
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respectiv e ly, and implement the structure shown in FIG 10. 

DMA Ring 118 is us e d to sch e dule d e scriptor point e rs for consumption by DMA Unit 
210, and impl e ments the structur e shown in FIG. 12. Crypto Ring 4 20 is us e d to sch e dul e 
buffers for proc e ssing on th e Crypto Processor 246 and impl e ments th e structur e shown in FIG 
1 1 . Th e fiv e g e n e ral purpose communication rings COM[4 :0j are available for assignm e nt by 
softwar e and also implem e nt the structur e shown in FIG 1 1 . 
2r. RX Buffer Pointer Ring and Produce/Consum e Pointers 

A ring of buffer point e rs r e sid e s in th e m e mory for each RX MAC. Associated with this 
ring ar e produc e and consume ind e x pointers for the various users of these buff e rs to access 
specific rings. The Policy Proc e ssor allocat e s free, empty buffers to the MAC by writing th e m to 
th e associat e d MPROD addr e ss in th e Ring Translation Unit (RTU), which writ e s the buff e r 
addr e ss into th e ring and incr e ments th e MPROD pointer modulo ring siz e . The RX MAC 
chases that point e r with th e MFILL ind e x which is used to find the next availabl e e mpty buff e r. 
That pointer is chased by MCCONS which is used by th e Classification Engine to id e ntify the 
next pack e t to run th e classification microcod e on. Th e PP us e s a status bit in th e PISR to se e 
that ther e is at l e ast on e classifi e d pack e t to proc e ss, th e n reads the ring through MPCONS in the 
RTU to id e ntify th e n e xt buff e r that th e PP needs to proc e ss. 

FIG 6 shows an RX Ring Structure r e lated to c e rtain e mbodim e nts of th e pres e nt 
inv e ntion. Th e re ar e two RX Rings 402 and 4 04. Each is locat e d in th e Ring Array in memory 
260. Each has four index regist e rs associat e d with it. FIG 6 shows th e ring as an array in 
memory with low e r addresses to the top and high e r addr e ss e s to the bottom of the pictur e . 

Th e ring's base address 5 10 is a combination of the Ring Base R e gist e r 400 and th e ring 
numb e r which is used to index into the Ring Array 440 as shown in FIG 5. Two instanc e s of the 
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sot of four indox registers MPCONS 512, MCCONS 511, MFILL 516, and MPROD 518 aro 
us e d to provide an offs e t from th e RX Ring Bas e 510 of th e particular ring 102 or 101, each of 
which is a 4 KB array 520. 

MPROD 5 1 8 is th e l e ad producer ind e x for this ring. Th e Policy Proc e ssor 24 1 or th e 
Application Proc e ssor 302 e nqu e u e s buff e r pointers into th e RX Ring 102 or 401 by writing th e 
buff e r pointer to th e RTU's e nqu e u e address for th e particular ring 102 or 101, which caus e s the 
RTU to writ e th e buff e r point e r to th e location in m e mory 260 r e ferenced by MPROD 5 1 8, and 
th e n to incr e ment MPROD 518 modulo th e ring siz e of 1096 bytes. This proc es s allocat e s an 
e mpty buff e r to the RX MAC MAC_A or MAC_B associat e d with ring 402 or 404 resp e ctiv e ly. 

MPROD 518 and MFILL 516 hav e a producer consumer r e lationship. Any tim e th e re is 
a diff e r e nce between the value of MPROD 518 and MFILL 516, the RTU 264 signals to th e 
associated RX MAC MAC A or MAC B that it has empty buffers availabl e . Th e r e gion 506 in 
th e RX Ring 102 or 4 0 4 repr e sents on e or mor e valid, empty buff e rs that have b e en allocat e d to 
the associat e d RX MAC by enqu e ueing th e point e rs to thos e buffers. 

Wh e n th e RX MAC MAC_A or MACJB r e c e iv e s a packet, it obtains th e buffer pointer 
r e ferenc e d by its associat e d MFILL point e r 516 by reading from th e RTU f s MFILL addr e ss and 
then writ e s th e pack e t and associated RX Status 600 and RX Tim e stamp 602 into th e buff e r 
point e d to by that buff e r point e r. Wh e n th e RXJV1AC has successfully r e c e iv e d a packet and has 
finish e d transf e rring it into th e buff e r, it incr e m e nts the ind e x MFILL 516 by a hardware signal 
to th e RTU which causes th e RTU to incr e m e nt MFILL 516 modulo th e ring siz e of 4 096 byt e s. 
MFILL 516 and MCCONS 5H have a produc e r consum e r r e lationship; wh e n th e RTU 261 
d e tects a differ e nc e b e tw ee n the value of MFILL 516 and MCCONS 51 4 it signals to that ring's 
associat e d Classification Engin e 238 or 212 that it has a freshly r e c e ived pack e t to process. The 
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region 501 in th e ring array contains the buffer point e rs to ono or moro full, unclassifi e d buffers 
that the RX MAC has pass e d to th e associat e d Classification Engine. - 

The Classification Engine 238 or 212 r e ceiv e s a signal if the RTU 261 detects fall, 
unclassifi e d pack e ts in RX Ring 102 or 101, r e sp e ctiv e ly. Wh e n the dispatch microcod e on that 
CE 238 or 212 tests the ring status and sees this signal from the RTU 261, that CE 238 or 212 
obtains th e buff e r point e r by r e ading from the RTU's MCCONS addr e ss for that ring. Wh e n th e 
CE 238 or 212 has finish e d proc e ssing that buffer and has writt e n all results back to m e mory 
260, it signals to th e RTU 261 to incr e m e nt its associat e d MCCONS index 511. Upon receiving 
this signal the RTU 261 increm e nts MCCONS 511 modulo the ring siz e of 1096 bytes. By 
sending th e signal, th e CE 238 or 212 has indicated that it is don e proc e ssing that pack e t and that 
th e pack e t is availabl e for th e consum e r, which is action cod e 108 running on the Policy 
Processor 211. Th e r e gion 502 contains th e buff e r pointers for on e or mor e full, classifi e d 
packets that th e Classification Engin e has pass e d to the Action Cod e 108. 

MCCONS 5 1 1 and MPCONS 5 12 have a producer consumer relationship. When th e CE 
238 or 212 has produc e d a full, classified pack e t then that packet is availabl e for consumption by 
th e action cod e 108. Th e RTU detects wh e n th e r e is a diff e r e nc e b e tween the values of 
MCCONS 511 and MPCONS 512 and signals this to the Policy Processor 211 through a status 
r e gist e r in th e Proc e ssor Int e rface 206. Th e Policy Proc e ssor 22 4 monitors this register, and 
wh e n dispatch code on th e Policy Processor 221 d e termin e s that it is ready to process a full, 
classified pack e t it d e qu e ues the buffer point e r of that pack e t from th e RX Ring 102 or 101, as 
appropriat e , by reading th e RTU's dequeue addr e ss for that ring. This r e ad caus e s the RTU to 
return to the Policy Proc e ssor 2 44 the buff e r pointer r e fer e nc e d by that ring's MPCONS index 
512, and th e n to incr e ment MPCONS 512 modulo th e ring siz e of 1096 byt e s. Th e act of 
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dequoueing th e buff e r pointer means that the point e r no longer has any moaning in th e RX ring. 
Th e cont e nts of th e ring in locations b e twe e n MPCONS 512 and MPROD 518 have no meaning, 
and ar e indicat e d by the Invalid r e gions 500 and 508. Sinc e this is a ring structur e which wraps, 
500 and 508 ar e actually the same r e gion; in th e figur e shown, due the current valu e s of th e ring 
ind e x point e rs 512, 51 4 , 516, and 518 th e Invalid regions 500 and 508 happ e ns to wrap across 
the start and end of th e array containing this ring, but it should be obvious to on e skill e d in th e 
art that und e r normal circumstanc e s these ring index pointers can hav e differ e nt values and any 
of r e gions 502, 504, or 506 could also b e region which wraps around th e e nd and b e ginning of 
th e array 520. 

2.1 RX Buff e r Structure 

The r e c e iv e data buff e r is a 2 KB structur e which contains an Eth e rn e t packet and 
information about that pack e t. A substantially similar format is us e d for transmitting th e pack e t, 
as indicat e d in Figur e 8. Th e pack e t offs e t from the bas e of the buffer is d e sign e d so that upon 
r e ceiv e th e Ether header is offset by two byt e s into a word, thus aligning th e IP head e r on a word 
(32 bit) boundary. Enough spac e is left before th e packet so that encapsulation/encryption 
headers ( e .g., up to 10 byt e s for a standard IPv6 h e ader plus AH and ESP) can b e ins e rt e d for 
encapsulation of th e packet without copying th e pack e t, by just copying th e Eth e rnet head e r up to 
mak e spac e and th e n ins e rting th e encapsulation head e rs. Th e total pad siz e is 112 Byt e s; if mor e 
is ne e d e d then the Crypto Coproc e ssor can realign th e pack e t wh e n writing it back. 

Th e RX MAC can b e programmed to eith e r drop bad pack e ts or r e c e ive th e m normally; if 
th e latt e r, th e n e rror status is also shown in th e buffer RX status fi e ld. 

FIG 7 illustrates the r e c e ive buffer format. 
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A packet is passed around th e system by placing it into a packet buffer 620 and th e n 

passing the 2KB aligned buffer point e r among units via point e r rings implemented by th e RTU 
261. The RX Status and Transmit Command Word 600 is always located at the word pointed to 
by the 2KB align e d buffer pointer. All hardwar e in th e Policy Engin e 322 is d e sign e d to assum e 
that a buff e r pointer is 2 KB align e d and to ignor e bits [10:0], which allows softwar e to us e bits 
[10:0] of th e buff e r point e r to carry software tag information associat e d with that buff e r. 

Upon r e ceiving a pack e t the RX MAC 220 or 228 plac e s that packet at an offs e t of (130) 
bytes from th e beginning of a buffer 620, and writ e s zero to th e byt e s at byt e offset (128) and 
(129) from th e b e ginning of that buff e r; thes e two byt e s ar e call e d th e Eth e rn e t H e ad e r Pad 618. 
The pack e t consists of th e (11) byte Ethern e t h e ader 610 and th e payload 612 of th e Eth e rn e t 
packet, which ar e stored contiguously in the buff e r 620. Th e r e ason for inserting the Ethern e t 
H e ad e r Pad is to forc e protocol head e rs e ncapsulat e d in th e Ethern e t packet to b e word (32 bit) 
aligned for eas e in furth e r processing; e ncapsulat e d protocols such as IP, TCP, UDP e tc hav e 
word ori e nted formats. 

The RX MAC control logic 220 or 228 then writes the RX Status Word 600 into the 
buffer 620 at an offset of (0) from the start of th e buffer, and an RX Tim e stamp 602 as a 32 bit 
word at byte offset (4) from th e start of th e buffer 620. Th e RX Status Word has th e format 
shown in Tabl e L The tim e stamp is th e valu e obtained from th e Timestamp R e gist e r 211 at th e 
time the RX status 600 is written to the buff e r 620. Th e TX Status Word 601 and th e TX 
Timestamp 606 ar e not written at this time, but those locations covering th e two 32 bit words at 
offs e ts of 8 and 12 bytes, r e sp e ctively, from the start of the buffer 620 are reserved for later us e 
by the TX MAC controllers 222 and 232. 

Th e format for th e RX Status word in Table 1 is such that it can b e us e d directly as a TX 
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Command Word without modification; tho fields LENGTH and PKT_OFFSET hav e the same 
meaning in both formats. Th e RX MAC controller 220 or 228 subtracts (1) byt e s from th e 
Ethernet packet's l e ngth befor e storing tho LENGTH field in the RX Status Word 600 such that 
th o (1 byte) Ethernet CRC is not counted in LENGTH, so that th e buffer can be handed to a TX 
MAC 222 or 232 without n e ed for tho Policy Processor 211 modifying the contents of tho buffer. 

Pad Space 608 is left b e fore tho start of tho pack e t 610 and 612 in buffer 620 to support 
th e addition of e ncapsulating protocol h e aders without copying th e entir e packet. Up to (1 12) 
bytes of encapsulation headcr(s) can bo inserted simply by copying tho o th o rnet header 610 (and 
possibly an associated SNAP e ncapsulation header in th e start of payload 612) upwards into th e 
Pad Spac e 608 by the number of byt e s n e cessary to make room for th e ins e rt e d head e rs, which 
ar e th e n written into the location that was op e n e d up for th e m in areas 608, 610, and 612 as 
ne e d e d. If mor e than (112) byt e s of e ncapsulation h e ad e r ar e b e ing ins e rted th e n th e entir e 
payload 612 must b e copi e d to a different location in the buffer to mak e room for th e insert e d 
head e rs. 

Th e per pack e t software data structur e 611 is us e d by th e classification 106, action cod e 
108, encryption processing 112, the host 302 and PCI peers 322, 311, and 316 to carry 
information about th e pack e t that is carried in the buffer 620. The location of th e software data 
structur e 611 and the siz e s of th e packet h e ad e r 610 and packet payload 612, as well as th e total 
siz e of th e pack e t buffer 620 ar e not hard limits in th e pr e f e rred e mbodim e nt. Th e 2KB 
alignm e nt of th e RX status word 600 and RX Tim e stamp ar e e nforc e d by the hardwar e ; but 
packets from oth e r sourc e s and also from oth e r m e dia besid e s Ethern e t can b e injected into th e 
classification flow of FIG. 2 as follows. Tho SOURCE field of the RX status word 600 as shown 
in Table 1 has only a f e w r e s e rved codes; th e rest can b e assigned by softwar e to id e ntify pack e ts 



SFRLIB1\JKS\5091501.05 



Page 41 



from oth e r sourc e s and also from oth e r m e dia which do not shar e th e pack e t format or pack e t 
siz e of Eth e rnet. By softwar e conv e ntion larg e r buff e rs can be assign e d by grouping contiguous 
2KB buffers togeth e r and treating th e m as one buffer; th e point e r to this larg e r buffer 602 will 
still bo 2KB aligned and the RX Status Word 600 and RX Tim e stamp 602 will s till reside at that 
location in the buff e r. The pack e t ar e a 610 and 612 can b e made arbitrarily larg e to 
accommodat e a packet from a differ e nt medium. Th e location of th e softwar e data structur e 6H 
can be mov e d downwards as the larg e r payload spac e is allocat e d. Alternatively th e softwar e 
can choos e to allocat e buff e rs so that they hav e space b e for e th e 2KB aligned RX Status Word 
600, and carry th e softwar e data structur e 611 abov e th e RX Status Word 600 rather than below 
th e Payload 612 as shown in FIG 7. Th e advantag e of this second approach is that th e location 
of th e softwar e data structur e is always known to b e at a fixed location relativ e to th e RX Status 
Word 600, rath e r than having that location b e a variabl e dep e nding on different media and th e 
resulting variations in the siz e of th e pack e t payload 612. 

Th e section mark e d "Available for softwar e us e " contains transi e nt p e r pack e t 
information such as th e r e sult vector and hash point e rs output by th e Classification Engine, a 
command d e scriptor for th e Crypto Unit, buffer refer e nce counts, an optional pointer to an 
e xt e nsion buff e r, and any other data structur e s that th e softwar e defines. "TX Status/TX 
Tim e stamp" is optionally written by th e transmit MAC if it is programmed to do so; that field 
contains garbag e aft e r an RX. 

The "RX Timestamp" field contains the 32 bit value of the chip's TIMER register at the 
tim e that th e packet was succ e ssfully rec e iv e d (approximat e ly the time of receipt of th e e nd of 
pack e t) and the RX_STATUS field was written. The "RX Status" field is one 32 bit word with 
th e following format: 
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Note throughout this docum e nt that bit [31] is tho left (most significant) bit of a 32 bit 
word, and bit [0] is right (least significant). "MCSR" mention e d in Table 1, below, is tho MAC 
Control and Status Register. 



Table 1 : Ethern e t RX Status Word and TX Command Word Format 



Bite 


FieW 


DeGcription 




TV A TX T*T Z 






BADJPKT 


Summary error bit; set if any of [30:27, 15:11] is set, which can only happen it 
the MAC is programmed to receive bad frames. 


P U J 


LK(J_bKK 


istnernet irame nad incorrect tKL ana (JY1v^i3K[ki^v_p/\ijj — i ) ior mis ivi/y^. 


nm 




Ethernet frame was smaller than legal and (MCSR[RCV_BAD] — 1) for this 


RUNT 


\/i a n 


Hftl 
l/- 0 J 


GIANT 


Ethernet frame was larger than legal and (MCSR[RCV_BAD] — 1) for this 




MAC 


mi 
t=TI 


PREAMB E 


Invalid preamble and (MCSR[RCV_BAD] — 1) for this MAC. This error is 


RR 


associated with some previous event, not with tho current packet. 


To/:. 1 c 


T TTXT/~ , T'"LJ 




|zo:lo 


LbJNLrlrl 


ror ka, numoer or Dyies in tne riinernei irame lnciuuing ine jc/tncrnei ncaucr out 
not including the Ethernet CRC. For TX, length of packet, including CRC if 


(MCSR[CRCJKN]— 0) 




DRBLERR 


Odd number of nibbles received (dribble) and (MCSR[RCVJBAD]— 1) for this 






MAG 




CODE ERR 


Ah^h pnrnHirm prrnr nnH fMT^RfRPV R \TYI H for thi° MAC 




BCAST 


The received packet was a broadcast packet (destination address is all 1 's) 


t*=I 


MCA ST 






The received packet was a multicast packet and was passed by the multicast hash 

unci 




SOURCE 


This indicates the source of the packet or other source as marked later by 






software. If the packet was generated at a RX MAC then this field is 0x0 for 


MAC_A or 0x1 for MAC_B. 




PKT OFFSE 


This is the byte offset from the beginning of the packet buffer to the first byte of 


} 




the Ethernet header. Other agents may choose to move this offset in order to 




encapsulate the IP packet or to strip of encapsulation headers. The CE, PP, and 
AP all use this offset when accessing the frame in this buffer. The RX MAC will 




always write a value of 0x82 into this field, indicating that the Ethernet Frame 
was received into the buffer starting at byte offset 130 from the start of the 


buffer. 



Th e same pack e t buffer format is us e d for encryption and transmission; for those us e s the only 
meaningful fields are LENGTH, PKT_OFFSET and tho contents of tho Ethern e t frame found at 
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that offs e t; plus for encryption the encryption d e scriptor included in the "Software" area in the 
buff e r. 

^ TX Buffer Pointer Rings and Producer/Consumer Point e rs 

A packet gets scheduled for transmission by onqueueing th e addr e ss of th e buffer onto th e 
pointer queue for that transmit MAC, by writing it to MTPROD in the RTU (MAC A and MAC 
B each have th e ir own ring and associated regist e rs). Any tim e th e produce point e r is not e qual 
to th e consum e pointer for that ring, the associat e d MAC will be notified that th e r e is at l e ast one 
packet to transmit and will follow th e pointer to obtain th e next buffer to deal with. When the 
packet has b e en retired the TX controll e r will writ e back status if configur e d to do so, th e n 
incr e m e nt the consume point e r and continu e to the n e xt buffer (if any.) 

The r e cov e r pointer is us e d to track retir e d buff e rs ( e ither successfully transmitted or 
abandoned du e to transmit termination conditions) for return to the buff e r pool, or possibly for a 
retransmit attempt; the PP is signal e d by the RTU that th e re is a delta b e twe e n MTCONS and 
MTRECOV, and then reads the Ring through the RTU r e gister MTRECOV to g o t the pointer to 
the next buffer to recov e r. MTPROD, MTCONS, and MTRECOV are duplicated for each 
instanc e of a transmit MAC. 

FIG. 8 illustrat e s th e TX Ring Structur e according to c e rtain embodiments of the pr e sent 
inv e ntion. 

The TX Rings 4 06 and 408 hav e substantially the sam e structur e as the RX Rings 
d e scribed previously. The fundam e ntal diff e r e nces ar e that ther e is on e few e r int e rim produc e r 
consumer using this ring, and that this ring is assigned for a diff e r e nt function with diff e r e nt 
agents using it. Each ring 106 and 108 is a 1096 byte array 720 in memory 260. 
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A pack e t is sch e dul e d for transmit on th e TX MACs 222 or 232 by e nqu e uing a pointer to 
tho buffer containing the packet onto TX Ring 106 or 108, r e spectively. Th e buffer pointer is 
enqueued onto 106 or 108 by any ag e nt, by writing th e buff e r point e r to th e RTU 261 enqu e u e 
address for that ring. The RTU 261 writes th e buff e r point e r to the location in m e mory 260 
refer e nc e d by the MTPROD ind e x r e gist e r 716, and th e n incr e m e nts MTPROD 716 modulo th e 
ring siz e of 1096 byt e s. Ther e is a producer consumer relationship b e tw ee n MTPROD 716 and 
MTCONS 711; when the RTU d e tects a differ e nc e in tho values of MTPROD 716 and MTCONS 
711 it signals to the associat e d TX MAC controll e r 222 or 232 that th e r e is a pack e t r e ady to 
transmit. The r e gion 706 in th e TX Ring 106 or 108 contains one or mor e buffer point e rs for the 
buff e rs containing pack e ts schedul e d for transmission. 

Th o TX MAC controller 222 or 232 obtains tho buff e r point e r for tho buffer 206 
containing this packet by reading the RTU's MTCONS address for TX Ring 106 or 108, 
respectiv e ly, which caus e s th e RTU to return to th e MAC th e buff e r point e r in memory 260 
referenced by MTCONS 711. When tho TX MAC 2 1 8 or 23 1 has succ e ssfully transmitt e d this 
pack e t or has abandoned transmitting this pack e t du e to transmit t e rmination conditions, its 
controll e r 222 or 232 respectiv e ly will optionally writ e back TX Status 806 and TX Tim e stamp 
808 if it has b ee n configured to writ e status, th e n retir e s th e buffer by signaling to th e RTU 261 
to increment MTCONS 711. Upon receiving this signal th e RTU 261 will incr e ment MTCONS 
711 modulo th e ring size of 1096 byt e s. 

Ind e x registers MTCONS 711 and MTRECOV 712 hav e a producer consumer 
r e lationship. Wh e n tho RTU det e cts a diff e r e nc e in their valu e s, it signals to th e PP that th e 
associated TX ring 106 or 108 has a retir e d buffer to r e cov e r. That information is visibl e to th e 
Policy Processor 211 in a status register in Processor Interface 206 which the Policy Processor 
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2 44 polls on occasion to se e what work it ne e ds to dispatch. Upon testing th e RECOVER status 
for the TX Ring 406 or 408 and d e tecting that th e re is at l e ast on e buffer to recover, the Buffer 
Recovery cod e 118 reads th e RTU's 264 MTRECOV addr e ss for that ring to dequ e u e th e buffer 
point e r from th e TX ring 406 or 408. Th e r e ad caus e s the RTU to return th e buff e r pointer 
r e f e renc e d by MTRECOV 712, and then to incr e ment MTRECOV 712 modulo th e ring siz e of 
4 096 bytes. The r e gion 704 contains th e buff e r point e rs of buff e rs which hav e be e n retired by 
th e TX MAC 222 or 232 but hav e not yet b ee n r e cover e d by th e Buff e r R e cov e ry cod e 118. 

Th e regions 702 and 708 are the sam e r e gion, which in the figure shown ar e spanning th e 
end and th e b e ginning of th e array 720 in m e mory 260 which contains th e TX Ring 406 or 408. 
This region contains e ntries which ar e n e ith e r a buffer point e r to a buff e r ready for transmit, nor 
a buff e r point e r to a buff e r which th e TX MAC 222 or 232 has retir e d but th e recov e ry cod e 118 
has not y e t d e qu e u e d. For the purposes of a TX Ring 406 or 408 this r e gion consists of space 
into which mor e pack e ts may b e scheduled for transmit. One skilled in the art will r e cogniz e d 
that r e gion 704 or region 706 could just as easily be th e r e gion wrapping around th e array 
boundary, depending on the values of MTRECOV 712, MTCONS 714, and MTPROD 716. 

Emb e dded in th e buffer is th e packet length in bytes (including the Ethern e t h e ad e r, but 
not including th e CRC sinc e th e TX MAC will g e n e rat e that) and also the byt e offs e t within the 
buff e r wher e the Eth e rn e t head e r begins. Th e offs e t is n e cessary sinc e th e start of pack e t might 
hav e b ee n mov e d back (if adding e ncapsulation h e aders) or forward (if d e capsulating a pack e t.) 
Th e Eth e rn e t h e ader typically starts at byt e offset 0x2 within that word, but th e TX MAC 
supports arbitrary byte alignm e nt. PKT_OFFSET and LENGTH are found in the "RX Status" 
and "TX Command" word of th e buff e r as d e scrib e d in Tabl e 1 ; for transmit purpos e s thos e ar e 
th e only two m e aningful fi e lds in that word. 
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Tho area lab e l e d "TX Status/TX Tim e stamp" is optionally written with on e word of 
transmit status plus tho value of TINIER, at tho timo tho field i s written, if MCSR[TX_STAT] is 
s e t; th e content of that word is d e scribed in Tabl e 2. 

FIG 9 illustrat e s th e transmit buffer format according to c e rtain embodim e nts of the 
present inv e ntion. 

Wh e n a packet is scheduled through TX Ring 4 06 or 408 to b e transmitted on a TX MAC 
218 or 234, resp e ctively, the TX MAC controller 222 or 232, r e sp e ctively, interpr e ts the cont e nts 
of th e pack e t buffer 840 in accordance with the format shown in FIG 9. The RX Status Word 
and TX Command Word 802 is found at th e location point e d to by th e 2KB align e d buff e r 
pointer obtained from the TX Ring 106 or 408. Th o RX Status and TX Command Word 802 is in 
the format specified by Table 1 ; when this word is int e rpret e d by th e TX MAC controll e r 222 or 
232 only the fields LENGTH and PKT_OFFSET hav e any moaning and the rest of the word is 
ignored. PKT_OFFSET indicates tho byte offset from tho start of the 2KB aligned buffer at 
which th e first byt e of th e Eth e rn e t h e ad e r is to b e found, and LENGTH is th e number of bytes to 
b o transmitt e d not including tho (4 byte) Ethern e t CRC which the TX MAC 222 or 232 will 
g e n e rat e and app e nd to th e packet as it is b e ing transmitted. Th e RX Tim e stamp 80 4 was us e d 
by previous agents proc e ssing this buff e r, and is not interpret e d by th e TX MAC controll e r 222 
or 232. 

The PKT_OFFSET field can legitimately have any valuo b e tw e en (16) and (255), 
allowing th e ag e nt that schedul e d th e transmit to manipulat e head e rs and to r e locate th e start of 
th e packet h e ad e r 812 as need e d. FIG 9. shows a z e ro fill e d two .byt e pad 830 prior to th e start 
of Ether H e ader 812, but that is not a r e quir e m e nt of th e pr e f e rred e mbodim e nt; the TX MAC 
222 or 232 can transmit a packet which starts at any arbitrary byt e alignm e nt in th e transmit 
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buffor 810. Tho two byt e pad 830 shown preceding th e head e r 812 is shown to illustrate the 
common case, wh e r e in a rec e iv e d pack e t was thus align e d and any movement of the ethern e t 
h e ad e r 812 for encapsulation or decapsulation of protocols is in units of words (4 byt e s.) Pad 
Spac e 810 can vary in size from zero byt e s to (210) byt e s as d e fin e d by th e valu e of 
PKTJ3FFSET in tho TX Command Word 802. 

The concatenation of Eth e r Header 812 and Payload 814 comprise th e pack e t that is 
transmitted, along with the g e n e rat e d Eth e rn e t CRC which th e TX MAC 222 or 232 appends 
during transmit. Th e Ethernet CRC field 816 is not normally used by the TX MAC 218 or 231 , 
but was writt e n ther e during receiv e by th o RX MAC 220 or 228. Each TX MAC controller 222 
and 232 has a configuration s e tting which can instruct it to not g e n e rat e CRC as it transmits; in 
that cas e tho LENGTH field in the TX Command Word 802 includes th o four bytes of Eth e rn e t 
CRC, and the data in 816 is s e nt with the pack e t for use as th e pack e t's CRC. This configuration 
which us e s softwar e gen e rat e d Ethern e t CRC is provid e d primarily as a diagnostic tool for 
sending bad pack e ts to oth e r devic e s on th e n e twork. 

Upon completion or abandonment of a transmit, th e TX MAC will write back th e TX 
Status Word 806 and th e TX Timestamp 808 if it is so configured. The TX Status Word 806 
contains th e information and format shown in Tabl e 2. Th e TX Timestamp 808 is writt e n with 
the value of th e Tim e stamp R e gi s t e r 21 4 at the tim e the writ e to TX Tim e stamp 808 is initiat e d. 

Th e softwar e data structur e 820 which trav e ls in th e pack e t buff e r 840 along with th e 
pack e t is th e sam e on e 614 discuss e d in th e d e scription of an RX buff e r 620 as shown in FIG. 7, 
and may b e r e located by softwar e conv e ntion as d e scribed in the discussion of FIG. 7. 
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Tho transmit status word 806 contains a flag indicating if th e transmission was 
successful, and the reason for failur e if th e transmit was abandon e d. This fi e ld is written only if 
MCSR[TX_STAT] is s o t, otherwise th o fields 806 and 808 contain uninitializ e d data. 



Table 2: Ethernet TX Status Word 





FieM 


Description 




TX OK 


Packet was successfully transmitted. 


L JU J 


LATECOL 


Transmit abandoned due to a late collision, (only if 


(MCSR.[LATE_COL_RTRY] — 0)) 


nm 




Transmit abandoned due to excessive collisions (16 collisions) 


XS_COL 




rooi 


XS DEFER 


Transmit abandoned due to excessive deferrals 




UNDERFLO 


Transmit abandoned due to slow memory response times. 




W 




L ZU J 


GIANT 


Packet length was larger than legal 






Number of collisions experienced (never shows more than 15; if XS_COL 


[25:22] 


COL CNT[3 


this value is V) 




reserved 


MAC writes 0x0 to this field. 






Number of bytes transmitted (includes the \ byte Ethernet CRC) 




TX SIZE[10 





Th e r e ar e 5 possibl e transmit pack e t sources sharing th e TX MAC; thes e are 
• — Th e RISC proc e ssor (Policy Proc e ssor) g e nerating or forwarding a pack e t 
• — Crypto g e nerating a modifi e d pack e t 
♦ — Th e AP e ith e r cr e ating, forwarding, or modifying a pack e t 
• — A d e vic e in a PCI e xpansion slot cr e ating, forwarding, or modifying a packet 
• — A p ee r PE forwarding a pack e t to a diff e r e nt n e twork s e gm e nt ( e .g. for routing or 
switching) 

Atomic e nqu e ueing by multipl e sourc e s is support e d via writ e s to RTU[MTPROD] 
associated with that MACs Transmit Ring. Th e RTU can d e tect high wat e r mark conditions and 
signal tho situation to tho PP and th e AP. Tho MTCONS ind e x pointer is increm e nt e d by th e 
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MAC wh e never a buff e r is retir e d; that is chased by another consume point e r increm e nt e d by 
reads of RTU[MTRECOV] which is used by the PP for recover of retired packet buffers to the 
buff e r pool and (optionally) chocking TX status. 

Ai Reclassify Rings 

The Classification Engine rec e iv e s pack e ts to classify from both th e RX MAC (via th e 
RX Ring), and from other sources (PP, AP, Crypto, and potentially other n e twork cards on the 
PCIbus). A second input ring (Reclassify Ring) is provided for e ach CE for th e se other sourc e s 
to schedule a pack e t for classification on that CE; each comprises a ring in m e mory with enqu e u e 
and d e queue operations s upport e d through th e RTU. Th e 32 bit entries in the ring are buff e r 
pointers. 

FIG 10 shows th e reclassify ring structur e . 

Th e Reclassify Rings 4 10, 4 12, 1 1 1, and 4 1 6 serv e a very similar purpose to the RX 
Rings 102 and 401, and have substantially the sam e structure. Th e substantive differ e nc e s ar e 
that there is one less int e rim consum e r produc e r in the R e classify Rings, and that packets g e t 
sch e dul e d through th e Reclassify Rings via a diff e rent path. R e classify Rings 1 10, 1 12, 1 1 1, and 
4 16 ar e used to schedule packets for proc e ssing on CE 238, 208, 212, and 212 r e spectiv e ly. 

In th e case of th e RX Ring 102 or 4 0 4 , buff e r point e rs are e nqu e ued by th e Buffer 
Allocation process 102 running on th e Policy Proc e ssor 214 using MPROD 518, which allocates 
the r e f e r e nc e d buff e rs as fre e and empty for the RX MAC 220 or 228, r e spectiv e ly, to consum e 
using MFILL 516 when receiving a packet and to produc e a full, unclassified buff e r to th e CE 
238 or 212, resp e ctiv e ly. Packets sch e duled for classification via the R e classify Rings 4 10, 1 12, 
1 11, and 4 16 com e from a sourc e oth e r than the RX MACs 220 or 228, as illustrated in FIG 2. 
Full, unclassified buffers g e t schedul e d onto one of the R e classify Rings when an agent e nqu e u e s 



SFRLIB1\JKS\5091501 .05 



Page 50 



tho buffer point e r onto th e ring by writing th e buffer point e r to th e RTU's 261 e nqueue address, 
which causes the RTU 261 to write the buffer pointer to th e location in memory 260 r e f e renced 
by RPROD 916 and then to increment RPROD 916 modulo tho ring size of 1096 bytos. 

From that point onward th e d e scription is substantially th e sam e as th e d e scription of th e 
RX Ring 102 and 101, except that RCCONS 911 is used in place of MCCONS 511, RPCONS 
912 is us e d in place of MPCONS 512, tho invalid r e gion 902 and 908 substitutes for 500 and 
508, Full and Classifi e d 901 substitutes for 502, and Full Unclassifi e d 906 replac e s 501. Sinc e 
this flow has no allocation of e mpty buff e rs th e r e is no e quival e nt to MFILL 5 16 nor to Valid 
Empty 506. 

Note that th e "Outbound" classifiers 208 and 212 e ach hav e only a Reclassify Ring 112 
and 416, r e sp e ctiv e ly, but no RX Ring sinc e they ar e not associat e d with an RX MAC. 
Crypto Command Queu e and G e neral Purpos e Communications Rings 

In ord e r to schedul e buff e rs for proc e ssing by th e ext e rnal (and optional) e ncryption 
engin e anoth e r m e mory based ring containing buffer point e rs is implem e nted, with e nqu e u e and 
dequeu e op e rations support e d through the RTU for th e Crypto unit to g e t th e next buff e r to 
process, plus a status bit indicating to Crypto that th e re is at l e ast on e packet buffer point e r in th e 
ring to process. Th e information about what op e rations to p e rform, k e ys, e tc. ar e e mb e dd e d in a 
Crypto Command Descriptor in th e software area of the buff e r. 

FIG. 11 shows tho Crypto Ring and COM[1:0] Rings Structures. 

Th o Crypto Ring 120, COM0 Ring 122, COM 1 Ring 121, COM2 Ring 126, COM3 Ring 
4 28, and COM 4 Ring 4 30 ar e identical in structur e . Any ag e nt can enqu e u e a buffer pointer or, 
in th e case of the COM Rings, any 32 bit datum, by writing to th e RTU's 26 4 e nqueu e addr e ss 
associat e d with th e particular ring. This causes tho RTU to stor e th e buff e r point e r or 32 bit 
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datum to the location in memory 260 r e f e r e nc e d by th e specifi e d PRODUCE Point e r 1010 and 
then to increment PR.ODUCE 1010 modulo the ring size of 4 096 byt e s. There is a produc e r 
consum e r relationship betw e en a particular ring's PRODUCE point e r 1010 and that ring's 
CONSUME pointer 1008. When th e RTU d e tects a differ e nce betw e en the valu e s of PRODUCE 
1010 and CONSUME 1008 it signals to th e consuming unit that ther e is at least on e e ntry to b e 
consum e d. 

Th e consumer dequ e ues a 32 bit entry from on e of th e s e rings by r e ading from th e RTU's 
d e qu e u e addr e ss associated with that particular ring; this caus e s th e RTU to r e turn th e data at th e 
addr e ss in m e mory 260 ref e r e nc e d by that CONSUME point e r 1008 and then to incr e m e nt 
CONSUME 1008 modulo th e ring size of 4 096 byt e s. As is illustrated her e , th e deg e n e rat e cas e 
of th e multipl e produc e r, multipl e consum e r ring structure d e scrib e d in figur e s 6, 8, and 10 is a 
singl e produc e r, singl e consum e r FIFO with fifo not e mpty status present e d to th e consum e r. 
The COM rings 4 22, 424, 126, and 4 28 all r e port ring - not empty status and (programmably per 
ring) e ith e r near full or n e ar e mpty thr e shold status to th e Policy Proc e ssor 214 through status 
regist e rs in th e processor interfac e 206. Th e s e rings can be assign e d for any purpos e ; anticipated 
us e s include a m e ssag e in ring for th e Policy Proc e ssor 244, a ring for allocating buff e rs for use 
by r e mot e ag e nts, and a ring for allocating DMA d e scriptors for us e by remot e ag e nts scheduling 
this Policy Engine's DMA Unit 210. 

Th e Crypto Ring 4 20 r e ports ring not e mpty status to th e Crypto Processor 246 through a 
status r e gister in Crypto Int e rfac e 202. COM4 430 also r e ports ring not e mpty status through a 
similar location, so that COM4 430 can optionally b e us e d to support sch e duling pack e ts for 
proc e ssing by a s e cond Crypto Processor 246. Th e Crypto Processor Int e rfac e 202 has 
additional support for a s e cond Crypto Processor 246, which might b e added to provide e ith e r 
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moro bandwidth for e ncryption proc e ssing or additional functionality such as compr e ssion. 
Packets would be sch e duled for processing on this second processor 216 by e nqueu e ing th e ir 
buffer pointers onto COM1 130. Alternatively, both the Crypto Ring 120 and COM1 130 can bo 
us e d to schedul e buffers for processing on th e one Crypto proc e ssor 216. 

Th e general purpose communication rings COM[1:0] 122, 121, 126, 128, and 130 ar e 
identical in structur e to th e Crypto Ring 120. 
6z DMA Command Queu e and D e scriptors 

Th e DMA e ngin e also uses a ring unit with an Enqu e u e regist e r for any ag e nt to sch e dul e 
DMA transf e rs (DMA_PROD), a Consum e r e gist e r for th e DMA engin e to g e t e ntries from th e 
ring (DMA_CONS), and a D e qu e u e r e gister for r e cov e ring r e tir e d descriptors (and th e 
associat e d buff e rs) from th e ring (DMA_RECOV). 

Th e DMA engin e is used to mov e data b e tw ee n th e memory and th e PCIbus; th e 
sourc e /targ e t on PCI can b e host (AP) memory or anoth e r PCI device. DMA op e rations are 
sch e dul e d by cr e ating a 16 byte d e scriptor in memory and th e n e nqueu e ing the addr e ss of that 
descriptor in th e DMA engin e 's command ring by writing it to DMA_PROD. Th e PP, th e host, a 
PCI bus pe e r, and Crypto can atomically schedul e use of this e ngin e . 

DMA is notifi e d by th e RTU wh e n the Produce point e r is not e qual to the Consum e 
pointer and proc e sses th e next descriptor. Wh e n that descriptor is retir e d, DMA increm e nts th e 
Consum e point e r; a d e lta b e tw ee n that and the R e cov e r point e r caus e s th e RTU to signal to the 
PP that th e r e ar e DMA descriptors (and th e associat e d buff e r point e rs) to recov e r. 
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Tabl e 3: DMA D e scriptor Format 



PCI_Addr o os [31:00] 



Flags [31:0] 



SI [3 1:27] 



Buf Addr o so [26:11] 



S2[10:0](pointcrtag 
field) 



S3[15:ll] 



Buf_Start_Indox [10:2] 



V7T7T7T7 



Word_Count[15:0] 



Th e areas lab e l e d "S2" and "S3" ar e available for softwar e us e . "SI" is r e s e rv e d for futur e 
expansion of PE memory siz e . 

Upon compl e tion of a transf e r, th e DMA engin e can optionally s e t a completion status bit 
in e ither th e Host Interrupt Register or Processor Interrupt Status Regist e r in cas e th e initiating 
agent wants compl e tion status of a transf e r or group of transf e rs. 8 bits are provided in e ach so 
that transf e rs can b e tagg e d as d e sir e d. This allows both AP and PP softwar e to hav e up to 8 
DMA compl e tion e vents schedul e d at on e tim e for tracking when particular groups of transfers 
hav e compl e ted, or for th e PP to signal to th e AP that information has b ee n pushed up to a 
mailbox or communication ring in AP m e mory, or for similar signals from th e AP to th e PP. 

Th e Pack e t Buff e r Addr e ss fi e ld contains th e pack e t buffer point e r in th e sam e format 
that is us e d by all other ag e nts in th e Policy Engin e ; this means that bits [10:0] ar e ignor e d by 
hardware and might contain tag information. The actual m e mory word address is th e 
concatenation of th e 2 KB aligned Packet_Buff e r_Addr e ss[31:ll] with Staitjnd e x[10:2], with 
00 in th e low e r two bits. Note that th e Word Count allows for a maximum DMA transf e r of 
(64 K 1 Words, or 25 6K 1 Bytes), in cas e th e r e are transf e rs larger than normal pack e t buffer 
mov e m e nt ( e .g. moving down PP cod e or CE microcod e ). 

Th e Flags word contains the following fields: 
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Tabl e 3 a: DMA D e scriptor "Flags" Word 





Fieki 


Descriptions 


[3 1:21 J 


aUr 1 [ 1U:UJ 


Available tor software use. 


nm 


TO MEM 


Direction: 1 — To Memory (From PCI), 0 — From Memory (To PCI) 






This is the PCI command code which is used on the PCI bus for these 


[19:16] 


PCI CMD[3:0] 


transactions; the most common codes will be 0x7 (Memory Write) and 0x6 


rMpmnrv PphH^ with n ,nmp nrnhnhilitv of nl^.o u^itif OxC ^Memory R&ad: 


Multiple) and OxE (Memory Read Line) if the attached host uses them for 


prefetch directives. 








[15:08] 


SET HISR[7:0] 


Any bit that is set will set the corresponding status bit in the HISR upon 
retirement of this descriptor. If no bit is set, no status is sent to HISR. 






Any bit that is set will set the corresponding status bit in the PISR upon 


[07:00] 


SET PISR[7:0] 




retirement of this descriptor. If no bit is sot, no status is sent to PISR. 



Since DMA d e scriptors ar e r e ad from memory by th e DMA e ngine, softwar e must e nsure e ith e r 



that the descriptors w e r e non cacheabl e by th e processor, or that they ar e flushed from th e PP 
cache prior to writing th e d e scriptor's addr e ss to th e DMA ring. 

For descriptors that ar e gen e rat e d by the AP or by a PCI p ee r s ee "Endiann e ss" in s e ction 8 for 
d e tails about d e scriptor e ndiann e ss. 

FIG. 12 shows the DMA Ring Structure. 

The DMA Ring 1 1 8 is substantially th e same as the TX Rings 106 and 108 as described 
in figur e 8. Th e re is a single enqu e u e ind e x DMAJPROD 1116 used to sch e dul e point e rs on th e 
ring 1 1 8 by any ag e nt, and int e rim consumer produc e r index DMACONS 1111 used by th e 
DMA Unit 120 to consum e n e wly schedul e d descriptor point e rs and to produc e r e tir e d d e scriptor 
point e rs, and a d e qu e u e ind e x DMA RECOV 1112 us e d by the Policy Proc e ssor 211 to r e cov e r 
r e tir e d d e scriptors as w e ll as th e buffers associat e d with th e m using the buffer point e r e mb e dded 
in th e DMA d e scriptor b e ing r e cov e red. Differ e nc e s b e tw ee n DMAPROD 1116 and 
DMA_CONS 1 1 1 1 ar e detected by the RTU 261 and reported to the DMA Unit 120. Differenc e s 
betw e en DMA_CONS 1111 and DMA_RECOV 1112 arc report e d by the RTU 261 to the Policy 
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Processor 211 through a status bit in th e Proc e ssor Interfac e 206. Region 1 106 contains on e or 
mor e d e scriptor pointers which point to DMA d e scriptors as d e scribed in Table 3. R e gion 1 10 4 
contains th e d e scriptor point e rs of d e scriptors which hav e been r e tired by DMA 120 but hav e not 
y e t been r e covered by Buff e r Recov e ry 118. Invalid 1102 and 1108 ar e th e unus e d spac e into 
which mor e pointers can b e schedul e d. 
1-. Buffer Allocation/Flow 

At initialization time the softwar e allocates a pool of siz e align e d 2 KB buff e rs in 
m e mory. Enough of th e s e ar e allocat e d to e ach of th e RX rings (that is, th e buff e r point e rs are 
enqu e ued on thos e rings by writing th e m to th e associat e d RTU[MPRQD]) to provid e the desired 
elasticity for the RX MAC, and th e r e st ar e plac e d on a fr ee list (e.g. on a softwar e manag e d 
link e d list.) Each time th e PP d e qu e u e s a buff e r from th e RX ring it can allocat e a n e w e mpty 
buff e r from th e freelist, thus k ee ping th e pool size constant. Buffers that go through Crypto may 
b e e nqu e ued by any ag e nt and ar e dequ e u e d by th e Crypto Proc e ssor which will th e n e nqu e ue 
them on th e sp e cifi e d destination ring aft e r proc e ssing. Buffers that ar e sch e dul e d for DMA ar e 
recov e r e d at th e sam e tim e th e associat e d DMA d e scriptor is r e cov e r e d from th e ring. Buff e rs 
may b e t e mporarily absorb e d by an application if it is qu e u e ing pack e ts for delay. A r e fer e nc e 
count can b e maintain e d in buff e rs which go to multiple r e ad e rs so that they r e tir e only when all 
read e rs hav e r e tired th e m. 

Th e goal is that th e PP can handl e buffer allocation and r e cov e ry through th e r e ad of 
status bits in the PISR, r e ads of RTU r e cov e r or d e queu e addresses to r e cov e r retir e d buff e rs 
wh e n th e RTU indicat e s through th e PISR that th e particular rings hav e buff e rs to r e cov e r, and 
writ e s to ring RTU e nqueu e addresses to allocat e n e w buff e rs. It is a primary goal that copying 
of buffers is avoided except when absolut e ly nec e ssary. 
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Rings report thr e shold warnings to th e PP/AP through the CRISIS regist e r when th e r e is 
danger of under/ov e rflowing (within 1/4 ring siz e of a problem situation) and also r e port 
full/empty status of rings through bits in th e CRISIS Regist e r as appropriate, 

1A The Lif e of an RX Packet Buff e r 

Ideally, a pack e t arrives into a buffer, g e ts proc e ss e d, and th e n g e ts transmitted out the 

other port or g e ts dropped. Proc e ssing may includ e a decision by th e application to e nqu e u e th e 

buff e r for t e mporary delay (and possibl e later dropping), to f ee d a packet through th e local 

optional Crypto for e ncryption work, or to pass a packet to th e AP or e xt e rnal coprocessor (s e e 

Figur e 4). Th e k e y conc e pt is to think of a packet as b e ing "own e d" by som e ag e nt, and that 

agent talcing r e sponsibility for th e final disposition of th e pack e t. 

7t3 Flow of a Buff e r Which R e mains Local 

At th e beginning of tim e th e syst e m allocates a numb e r of buffers to an RX MAC by 

writing their point e rs into that RX Ring's RTU[MPROD] e nqu e u e register, which pr e sents th e se 

buff e rs to that MAC as empty/allocated. Th e s e buff e rs are now own e d by that RX MAC, and 

cannot be touched by others until th e MAC has so indicat e d. Wh e n the RX MAC has fill e d a 

buff e r with a newly r e ceived pack e t it pass e s own e rship to th e associat e d Classification Engine 

by moving th e MFILL point e r to th e n e xt e ntry (buff e r pointer) in the ring. The CE will det e ct 

this, th e n process that pack e t; wh e n it is don e it pass e s own e rship to th e PP by incr e menting th e 

MCCONS index modulo ring siz e , and th e n th e application(s) running on the PP will d e t e rmin e 

what action(s) to tak e . Own e rship of a buffer is always e xplicitly r e linquish e d by th e curr e nt 

own e r. 



SFRLIB1\JKS\5091501.05 



Page 57 



Tho PP can p e rform any conv e ntional actions with a buff e r. Exampl e s of actions for a 
buffer which remains e ntir e ly local arc DROP, FORWARD, MODIFY or temporarily 
ENQUEUE then later FORWARD. 

DROP: Th e cod e running on th e PP determin e s that th e r e are no further uses for th e 
cont e nts of this buffer, so it retir e s/r e covers th e buff e r. Typically this occurs wh e n the Action 
portion of th e application(s) running on the PP d e cid e that a pack e t does not m e et th e criteria for 
passing it forward. 

FORWARD: Th e PP enqu e u e s th e point e r onto th e appropriate TX ring; TX is fir e and 
forg e t (with optional compl e tion status from th e MAC), with th e hardwar e r e sponsibl e for e ith e r 
compl e ting or abandoning th e transmit (that is, th e TX MAC owns that buff e r). Som e tim e lat e r 
in th e buff e r r e claimation code, th e PP will r e cogniz e that th e TX MAC has r e tired this pack e t (is 
done with it) since the PJTU indicates that there is a delta b e tween MTCONS and MTRECOV, 
thus own e rship of that buff e r has transferr e d back to th e PP. Th e PP th e n checks TX completion 
status (if th e application(s) car e ) and r e cov e rs th e buffer or r e sch e dul e s th e transmit as 
appropriat e . 

MODIFY: The application may choos e to s e nd th e pack e t through Crypto for processing, 
may e ncapsulat e /decapsulat e th e pack e t, could do addr e ss translation, or can do any oth e r 
modification of the pack e t that th e application directs. 

ENQUEUE: Th e application running on the PP det e rmin e s that it wants to hold on to th e 
pack e t for som e period of time, aft e r which it will eith e r forward or drop it. Own e rship of that 
buffer stays with th e application until it r e linquish e s it by e nqu e uing th e buffer's pointer on th e 
appropriat e TX or R e classify ring, or by d e ciding to DROP it, in which case the same path as 
DROP (above) is follow e d. In the Enqu e ue case the av e rage r e sid e ncy of a packet in a m e mory 
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buff e r is much long e r than in th e simple DROP or FORWARD cas e s, go if applications ar e 
onquou e ing packets th e n care must be taken to allocate a large enough buffer pool. 
13 Buffer Handling for Packets S e nt to the PCI Bus 

The applications(s) on th e PP may d e cide that a packet should be forward e d to th e AP 
e ither for further proc e ssing or b e caus e th e packet is actually targ e t e d at th e AP as th e final 
d e stination. In e ith e r cas e it is n e c e ssary to migrat e th e packet to buffers in th e AP's memory 
(e.g. into mbufs in the stack running there or into application specific storag e .) Th e buff e r its e lf 
is not migrat e d, som e or all of its cont e nts ar e copi e d to a diff e r e nt buff e r in host memory; this is 
don e using the DMA e ngin e . 

Alternatively th e application could choos e to stor e th e pack e t locally (that is, maintain 
ownership of th e buff e r) and simply pass a pointer and oth e r information up to th e AP. In this 
cas e th e PP cannot reclaim th e buff e r until th e AP has informed th e PP that own e rship of th e 
buff e r has b e en r e l e as e d back to th e PR 

Oth e r r e asons for sending pack e ts up to th e PCI bus includ e a push mod e l p ee r to peer 
copy to a diff e rent Policy Engin e or e xt e rnal coproc e ssor, and logging of select e d packets at th e 
AP. Th e latt e r is int e r e sting b e caus e it may involv e a fork wh e r e a pack e t tak e s two paths; on e to 
a MAC transmit queu e , and a s e cond to the PCI bus; reclaimation of that buffer would requir e a 
converg e nc e of completion, that is y a "join" function befor e th e buff e r can b e r e claimed (if 
copying is to b e avoid e d.) Softwar e can maintain a r e ferenc e count in th e buff e r for this purpos e . 

Forwarding a packet to th e AP can b e in th e guis e of NIC lik e b e havior or for 
application specific communication. In e ith e r cas e th e pack e t's buff e r pointer is writt e n to a 
DMA descriptor as the MEM_ADDR, and aft e r th e re s t of the DMA descriptor is created the 
point e r to that d e scriptor is e nqu e u e d on the DMA e ngin e 's command queue. As with all oth e r 
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quou o s d e scribed so far, the PP has a trailing recov e r point e r DMA_RECOV and r e c e iv e s status 
in tho PISR from th e RTU when there are retired descriptors to recov e r. 

The "NIC" int e rface as se e n in host m e mory can be arbitrarily compl e x, but can bo as 
simple as a m e mory imag e consisting of a buff e r pool and pointer ring with a produc e and a 
consume point e r, all in host m e mory; the "RX NIC int e rface" can m e an r e ading a point e r to a 
fr ee buff e r, DMA'ing th e e ntir e pack e t buffer to that location, following that with a DMA of a 
n e w valu e to th e "Produc e " point e r associated with it, and an int e rrupt to the host (using one of 
tho bits HISR[DMA_DONE[7:0]]) upon completion of that DMA. More e ffici e nt host structur e s 
can b e impl e mented without much more compl e xity. Communication down from the AP can 
also use th e DMA engin e and can involv e a similar softwar e ring structure in eith e r host or PE 
m e mory; m e ssag e s and/or ring ind e xes are writt e n by the AP into on e of th e 16 Mailbox 
locations provided, which writ e data to PE m e mory and s e t a per mailbox status bit which 
signals mailbox status through th e PISR to th e PP. 

A pe e r to p ee r routing op e ration with a push model might r e quir e a buff e r pool in PE 
m e mory to b e allocat e d for each p e er that will b e doing this; then s e nding a pack e t to another 
Policy Engin e for transmit is as simpl e as sch e duling a DMA to copy th e data from th e local 
buff e r to a buffer in this PE's buff e r pool on th e r e mot e PE, follow e d by a DMA of th e pointer to 
that buff e r (in th e "local" point e r format) into RTU[MTPROD] to sch e dul e it for transmit. Later 
th e r e mot e PP will r e claim th e buff e r som e tim e aft e r th e transmit is don e , and will s e nd back th e 
point e r (or a "cr e dit" m e ssag e ) by DMA'ing it to this PP's "fr ee list" ring for that particular p e er. 

Anoth e r mor e g e n e ral method of allocating buff e rs and DMA d e scriptors to remot e 
mast e rs is to assign on e of th e g e neral purpos e COM rings to contain a fr ee list of buffer pointers, 
and a s e cond to contain a fr ee list of DMA d e scriptor point e rs; any remote mast e r d e siring to push 
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data could thon simply r e ad th e two rings to obtain both a targ e t buff e r and a DMA d e scriptor for 
scheduling a fill of that buffer. 

A "pull" mod e l of communication would have the r e mot e master send only a (PCI) 
point e r or a descriptor down through eith e r a mailbox or a COM ring allocated for this function, 
and require th e PP to sel e ct a buff e r from its own pool of buff e rs allocat e d for this purpose, using 
DMA to copy th e buff e r from the remot e m e mory into local memory, th e n taking what e v e r 
actions ar e sp e cifi e d for that pack e t. Ownership of th e actual buffer in this case always b e longs 
to th e PR 

1-A Placem e nt of the Softwar e Structur e in th e Buffer 

While the hardware defines th e location of the r e c e iv e and transmit control and status 

words and th e location of th e pack e t in th e pack e t buffer, it is only by conv e ntion that th e 
softwar e structure resid e s forward from th e 2 KB align e d buffer pointer. A diff e r e nt convention 
can b e us e d wher e th e softwar e structure of N bytes actually b e gins N byt e s before the 2 KB 
aligned buff e r pointer; in this cas e th e buff e rs managed and allocated by softwar e ar e actually (2 
KB — N) byt e align e d, and the RX status word is placed N bytes into the buff e r, which lands it 
precis e ly on th e 2 KB align e d word wh e r e it already go e s; hardwar e do e sn't know th e diff e renc e , 
but software can take advantage of such a structur e to allow for arbitrary sized packets from any 
m e dia, which start forward from th e RX status word just lik e the e th e rn e t pack e t but may occupy 
contiguous memory far bigger than an e th e rn e t pack e t would. By placing the softwar e structur e 
b e for e the RX status word, th e structur e do e s not hav e to b e moved to accommodate larg e r 
packets. 
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%A Overview 

Int e rnal to th e Policy Engin e ASIC, all agents ar e big e ndian. This includes th e MACs, 
m e mory, th e CEs, th e Policy Proc e ssor, th e Crypto port, and th e DMA e ngin e d e scriptor format. 
This choic e is most conv e nient for d e aling with protocol h e ad e rs, which ar e typically big e ndian 
nativ e . The CE itself has no e ndiann e ss sinc e it works only in units of 32 bits throughout; 
how e v e r, it does d e al with multibyt e data in the way thos e words are formatt e d in m e mory, thus 
it se e s th e big e ndian layout of th e pack e t buff e r contents and also writ e s its status words and 
hash pointers in big endian format, which is what th e PP expects to s ee . 

All PIO acc e sses from PCI to r e gist e rs (PCI addr e ss rang e r e cognized by BAR1) ar e 
r e quir e d to b e 32 bit acc e ss only. Th e regist e rs conn e ct to the PCI bus so that bit<0> of the host 
CPU register is bit<0> of th e PE regist e r, and bit<3 1> corresponds to bit<3 1>. This implies that 
bit<0> of a r e gist e r acc e ss travels on bit<0> of th e PCIbus. R e gist e rs ar e plac e d on doubl e word 
boundari e s but ar e access e d as words, and the data trav e ls on bits<3 1 :0> of th e PCI bus e ven if 
th e bus is conn e cting 64 bit agents. As word only e ntiti e s the r e gisters hav e no byt e ord e r issu e . 
Th e sam e is tru e of PCI Configuration Regist e r access e s. 

All transfers b e tw e en memory and th e PCIbus move data by byt e lan e ; this m e ans that 
byte<0> in memory travels on byte<0> on th e PCIbus, byt e < 1 > on byte< 1 >, e tc. This is 
e ndian n e utral for byt e str e ams. This appli e s to all DMA activity, to PIO acc e sses from the 
PCIbus to/from m e mory, and also r e ads and writes from PCI through th e Ring Translation Unit; 
th e rings ar e simply memory with fancy addr e ss translation. 
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Tabl e 4: Byt e Lane St e ering, PCI64 to Memory 



-(byte-*) 



(bytes 6) 



(byte 5) 



(byte 4) 



(byt e 3) 



(byte 2) 



(byte 1) 



(byte 0) 



PCI[47:40] 



PCI[15: 8 ] 



PCI[7:0] 



M[7:0] 



M[15:8] 



M[23:16] 



M[31:24] 



M[39:32] 



M[47:40] 



M[55:4 8 ] 



M[63:56] 



Table 5: Byte Lan e St ee ring, PCI32 to Mom 



(byt e 3) 



(byto 2) 



(byt e 1) 



(byto 0) 



PCI[31:24] 



PCI[23:16 



PCI[15: 



PCI[7:0] 



First data phas e (or word at 0x0) 



M[39:32] 



M[47:40] 



M[55:48 



M[63:56] 



S e cond data phas e (or word at 0x4) 



M[7:0] 



M[15: 8 ] 



Mf24r 



M[31 :24] 



This byt e lan e ste e ring has som e inter e sting implications that n ee d to b e und e rstood so that it is 
cl e ar wh e n softwar e will hav e to twist data. Four int e r e sting cas e s will b e e xamin e d: (a) th e host 
writing a DMA d e scriptor into m e mory for th e DMA e ngin e to consum e , (b) th e host writing a 
messag e to th e PP in m e mory, (c) th e PP writing a m e ssag e in m e mory that is DMA'd to host 
m e mory, and (d) issu e s surrounding loading of CMEM in th e four CE's. 
8t3 Host Writing a DMA d e scriptor in memory 

Th e DMA d e scriptor is not a byt e str e am, th e r e for e th e e ndian n e utral PIO from th e host 
to m e mory is not suffici e nt. Th e DMA e ngin e s ee s the d e scriptor as a 16 byt e , 16 byt e align e d 
big e ndian data structur e as shown in Tabl e 3 on pag e 22. For this e xampl e th e fi e lds ar e 
simplified into a 32 bit PCI address PA, a 32 bit Buffer Address BA, a 16 bit offset OF, a 16 bit 
Word Count WC, and a 32 bit Flag word F. 

H e r e is th e big -e ndian vi e w of that descriptor as it app e ars in m e mory and as th e DMA 



e ngin e interprets it: 
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Tabl e 6: DMA D e scriptor Byt e Order, big e ndian memory 



(byte 0) 



PA 
[31:21] 



BA[31:24 



(byt e 1) 



PA 
[23:16] 



BA[23:16 



(byte 2) 



PA 
[15:0 8 ] 



BA[15:8] 



(byto 3) 



PA 
[07:00] 



BA[7:0] 



(byto 1) 



F[31:21] 



OF[15:08 



(byto 5) 



F[23:16] 



OF[7:0] 



(byt e 6) 



F[15:08] 



WC[15:0 



(byt e 7) 



F[07:00] 



WC[7:0] 



Assuming that th e host (AP) will write to this data structur e in PE m e mory using word 
PIO's ov e r PCI (for th e e xampl e shown), th e host must pr e scrambl e thos e words so that th e data 
will arriv e in th e corr e ct byte lan e s: 

Tabl e 7: DMA Descriptor Byt e Order, littl e e ndian regist e r 



(byt e 3) 



(byt e 2) 



(byt e 1) 



(byto 0) 



First data phas e (word at 0x0) 



PA[07:00] 



PA[ 15:08] 



PA[23:16] 



PA[31:21] 



Second data phas e (word at 0x4) 



F[07:00] 



F[15:08] 



F[23:16] 



F[31:21] 



Third data phas e (word at 0x8) 



BA[7:0] 



BA[15:8] 



BA[23:16] 



BA[31:21] 



Fourth data phas e (word at OxC) 



WC[07:00] 



WC[ 15:08] 



OF[7:0] 



OF[15: 8 ] 



and th e n wh e n th e host writes th e address of th e d e scriptor into th e DMA ring (which is "byt e 
lan e " m e mory), that descriptor point e r is writt e n as a word with th e following cont e nt: 
Tabl e 8: Descriptor Point e r Byt e Ord e r, littl e endian r e gister 



(byto 3) 



DESC_A[07:00] 



(byte2) 



DESC_A[ 15:08] 



(byte4) 



DESC_A[23:16] 



(byto 0) 



DESC_A[31:21] 



Not e that r e ads and writ e s through th e ring unit ar e access e s to m e mory, not to r e gist e rs, which is 
why th e addr e ss_shuffl e (wh e r e "th e addr e ss" is data, as abov e ) is r e quir e d wh e n th e host is 



writing to th e ring e nqu e u e addr e ss. 
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&r3 Host Writing a m e ssag e to tho PP in memory 

The PP vi e ws th e memory as big e ndian in th e sam e manner as the DMA engine, so the 
e xampl e in 7.8.2 describ e s this path as well. Messages ar e e ither a byt e str e am, or r e quire th e 
host to manually byt e swap larg e r data. Th e cont e nts of a mailbox and th e cont e nts of any ring 
entry or other item in memory will follow the same format as shown in Tabl e 8. 
PP writing a m e ssag e in m e mory that is DMA'ed to th e host 

If messages s e nt up to th e host are simply a byte stream th e n th e r e is no issu e , sinc e byt e 
str e ams travel in an e ndian neutral way. If on the other hand th e m e ssage includes data that ar e 
larg e r than a byt e ( e .g. a buffer point e r), byt e swapping occurs and both ends of th e 
communication must b e awar e of this. 

For exampl e , if th e PP wants to send a 32 bit address to the host, it must byt e swap within 
that word before sending it. That is, if th e PP wants to send th e 32 bit word oxd e adb ee f up to th e 
host as a messag e , th e n th e PP must put it into m e mory as ox e fbeadd e (se e Tabl e 5.) 
£t5 Classification Engine CMEM fills 

Writing instructions into CMEM in the Classification Engin e s takes on e of two paths; th e 
data is e ither DMA'ed or PlO' e d into PE memory from th e host and th e n copi e d from memory to 
CMEM by th e CE (using th o CE's FILL DMA unit), or th e host can PIO data directly into 
CMEM ov e r th e Regist e r interfac e (CMEM_DIAG access). 

Th e CMEM_DIAG path is word orient e d and no twisting occurs, sinc e it is all via th e 
regist e r path. Th e 32 bit data and addr e ss e s seen in th e host proc e ssor is th e sam e 32 bit data 
that is scon in th e AP's r e gist e rs. Diagnostic PIO's of data ar e s e nt to CMEM in th e ord e r [L e ast 
Significant Word, th e n Most Significant Word] to construct th e 64 bit instruction. 
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Tho FILL JDMA path takes 61 bit words from PE memory and writes th e m into th e 61 
bit CMEM. Th e compil e r and host softwar e always handle 61 bit instructions in th e ir native 
(that is, readabl e ) form. CMEM instructions ar e laid out as native 61 bit units in host m e mory; 
tho host/compil e r do e s not n ee d to twist them to h e lp tho (other endian) recipient. When the data 
arriv e s in PE m e mory, e ach 61 bit instruction will arriv e byt e swapp e d du e to byte lan e st e ering; 
that is, the instruction 

0xaabbccdd_oeff0123 
in host memory will land in PE memory as 

0x2301FFEE_DDCCBBAA 
and th e CE CMEM Fill data path is wired as shown in Tabl e 4, so that th e byt e s land in th e 
corr e ct place. Thus th e MSB from PE memory will go to th e LSB in CMEM, and vice versa. 
This works wheth e r the data arriv e d in PE m e mory via a PIO from th e AP or via a DMA from 
host m e mory prior to th e FILL_DMA transf e r into CMEM. 

Tho upshot of all of this is that th e CMEM_FILL DMA unit vi e ws PE m e mory as littl e 
endian; and it do e sn f t matt e r to anyone using normal paths that CMEM microcod e imag e s ar e 
byt e swapp e d while they r e side in th e staging area in PE m e mory. This is all hidden from 
softwar e . 

IV. Classification Engin e 
Th e Classification Engin e (CE) is a microprogramm e d proc e ssor d e sign e d to acc e l e rat e 
pr e dicat e analysis in network infrastructur e applications. Th e primary functions commonly us e d 
in pr e dicat e analysis includ e parsing layers of successiv e ly encapsulated headers, tabl e lookups, 
and checksum v e rification. 
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Header parsing consists of e xtracting arbitrary singl e or multiple bit fi e lds from thos e 
headers, comparing those fields to one or mor e constants, th e n talcing the results of thes e 
comparisons and doing bool e an reductions on multipl e e xtraction results to reduc e them finally 
to a single "match e s/do e sn't match" status for each compl e x predicat e stat e ment; this singl e 
bool e an value can th e n be us e d to quickly dispatch the appropriate actions at the PR Th e siz e of 
e ach h e ad e r is also determin e d so that th e n e xt l e vel of protocol can b e found and parsed in 
sequ e nc e . Applications can also choos e to e xamine packet cont e nts in addition to th e h e ad e rs if 
desir e d; th e CE do e s not treat th e header portion of a packet any differently from th e payload 
portion. 

Table lookups can consist of comparing an extract e d value against a tabl e of constants, or 
can involv e g e nerating a hash key from extract e d valu e s and then doing a lookup in a hash tabl e 
(content addressabl e tabl e ) to id e ntify a record associated with pack e ts matching that key; th e 
r e cord can contain arbitrary application specific information such as p e rmissions, count e rs, 
e ncryption context, e tc. 

Checksum verification involv e s arithm e tic functions across protocol head e rs and/or 
pack e t payloads to d e termin e if th e pack e t cont e nts ar e valid and thus compris e a valid pack e t. A 
sp e cial adder parallel to the mask rotat e unit called split add adds th e upp e r and low e r half of a 
32 bit op e rand tog e ther and produces a 17 bit r e sult for us e as an op e rand by the ALU; this is 
us e d in TCP, UDP, and IP ch e cksum computation. 

Sinc e on e purpos e of th e CE is to h e lp th e PP to avoid n ee ding to touch pack e t cont e nts 
and thus fault portions of th e pack e t into the PP's data cach e , th e CE can also b e programmed to 
e xtract arbitrary data fi e lds and optionally do computations on th e m, th e n pass th e r e sults to th e 
applications running on th e PP via th e pack e t buffer's softwar e data structur e . 
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A software structur e is carri e d in th e pack e t buffer along with the pack e t and the 
associat e d MAC status. This structur e is written with pr e dicate analysis r e sults, hash table 
pointers to records found, hash ins e rtion pointers in th e case of a failed search, checksum r e sults, 
a point e r to the bas e of each protocol found, e xtract e d and comput e d fields, etc. for us e by th e 
application(s) running on the PP. 

In ord e r to acc e l e rat e th e se functions, th e Classification Engine loads som e or all of th e 
packet from th e PE f s SDRAM based m e mory (PE M e mory) into a packet memory (PMEM) 
which it can then acc e ss randomly or s e qu e ntially to e xtract fi e lds from th e packet. A mask and 
rotat e unit allows arbitrary bit fi e lds to be extract e d from words of th e pack e t which can th e n be 
us e d as op e rands in computation or as comparison valu e s for bulk tabl e comparisons. Tabl e 
comparisons or individual arithm e tic and logic op e rations can s e t one or mor e bits in th e r e sult 
vector which is a larg e , 1 bit wid e register file. Th e se RESVEC bits can th e n b e acc e ss e d 
randomly and arbitrary bool e an op e rations can be don e on pairs of bits to produc e more resv e c 
bits, at a rat e of up to two bool e an bit operations per cycle, e ventually r e ducing s e ts of bits to 
singl e bit pr e dicat e results. Gang op e rations (GANGOPs) h e lp optimize boolean reduction by 
doing a logical operation (OR, AND, NOR, or NAND) on any numb e r of select e d bits within a 
32 bit group of RESVEC bits in a singl e clock, producing a single RESVEC bit as a r e sult. Aft e r 
bool e an reduction is compl e t e , som e or all of the r e sult vector can th e n b e spilled to th e softwar e 
structur e in th e pack e t buff e r in PE M e mory for us e by the Policy Proc e ssor. 

A 32 bit Arithm e tic and Logic Unit (ALU) and a s e t of g e neral purpos e 32 bit r e gist e rs 
(GPREG) allow for g e neral computation as w e ll. 

Program flow control in th e branch unit allows th e microcode to decid e if th e n e xt 
instruction in th e microcod e control store (CMEM) comes from a s e quential location, from a 
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relativ e branch value which can b e an immediat e valu e in th e micro word or the cont e nts of a 
GPREG, or (in the case of a RETURN) from th e top of th e hardware microstack; microstack 
valu e s ar e enqueu e d wh e n a CALL styl e of branch is ex e cut e d, and the microstack is acc e ssed in 
LIFO (last in, first out) fashion to support n e st e d subroutin e s in th e microcod e . Branch, Call, 
and R e turn op e rations ar e all conditional bas e d on any of th e rich set of condition cod e s 
provid e d. Wh e n th e microcode bit "BRANCH JEN" is s e t then a Branch, Call, or R e turn is 
e x e cut e d if th e s e l e ct e d condition cod e is tru e ; calls and r e turns ar e don e if the associat e d bit 
CALL or RET is s e t in th e control word wh e n BRANCH_EN is set. Du e to pip e lining of th e 
micros e qu e ncer all program flow changes hav e a 1 cycle d e lay b e for e talcing e ff e ct, so th e 
instruction following any of program flow control instructions (th e "branch delay slot") is always 
e xecut e d r e gardl e ss of th e success or failur e of the conditional flow control instruction; as a 
r e sult of this th e addr e ss stored in the microstack upon a succ e ssful CALL is th e address of the 
first instruction following the d e lay slot. 

Th e CE also contains s e v e ral sp e cial purpos e regist e rs and also supports e x e cution of 
many sp e cial op e rations. Sp e cial purpose r e gist e rs includ e th e interface to PE m e mory, the 
condition cod e r e gist e r, a m e mory base point e r r e gister us e d for base ind e x acc e ss to packet 
buff e rs in PE m e mory, a chip wid e tim e stamp tim e r, and instrumentation and diagnostic r e gist e rs 
including a count e r which monitors e x e cution time and a counter which tracks stall cycl e s du e to 
various m e mory interfac e d e lays. 

Th e m e mory int e rfac e appears to th e microcode as 3 FIFO's; DFIFO_W r e c e iv e s on e or 
mor e words of data to b e pack e d into a m e mory burst access for stor e s, DFIFO_R unpacks 
requ e sted bursts of data that hav e b ee n r e ad from m e mory, and MEM_ADDR r e ceives PE 
memory addr e sses along with siz e and direction information. Reads (or "loads") ar e non 



SFRLIB1VJKSV5091501.05 



Page 69 



blocking; microcod e sch e dules a load and th e n can tak e th e data from DFIFOR at any tim e 
later; if th e data has not y e t arriv e d th e n th e pip e lin e will stall until it do e s. Th e pip e lin e will also 
stall if th e r e is an att e mpt to writ e data to DFIFOW and th e re is no room or if th e r e is an att e mpt 
to schedul e anoth e r addr e ss in MEM_ADDR and th e r e is no room. Both of th e s e conditions ar e 
s e lf cl e aring as the fifos drain to th e chip's m e mory controller. Extensiv e e rror ch e cking logic 
uses count e rs to track th e stat e of various parts of th e m e mory interfac e and wall not allow 
microcod e to ov e rsubscribe DFIFO_R nor to issu e a writ e ("stor e ") to memory unl e ss pr e cis e ly 
th e right numb e r of words of data hav e already b ee n sch e duled in DFIFO_W. M e mory acc e ss e s 
sizes ar e 1, 2, 1, or 8 32 bit words. 

Using th e m e mory interfac e for a stor e consists of writing th e d e sired number of words of 
data to DFIFOW, then committing th e stor e by scheduling th e addr e ss into MEM_ADDR along 
with th e appropriat e siz e cod e and th e direction flag for a stor e . Using it for a load consists of 
scheduling th e address, siz e , and dir e ction flag for a load into MEM_ADDR, th e n consuming 
precis e ly that many words in order from DFIFO R at some lat e r tim e . DFIFO R holds up to \ 
maximum sized bursts or up to 32 words of data scheduled as small e r r e ads, so prop e rly written 
microcod e can oft e n hid e th e latency of reading PE M e mory by scheduling s e veral loads befor e 
consuming th e r e sult of th e first. Bulk data mov e m e nt such as filling PMEM with a pack e t can 
keep s e v e ral reads outstanding in a pip e lin e d fashion to move data at th e maximum memory 
bandwidth availabl e . 

Th e se non blocking loads h e lp to acc e lerat e hash tabl e search e s and link e d list s e arch e s; 
onc e th e h e ad e r of a r e cord has be e n f e tch e d, th e forward pointer can be us e d to speculatively 
f e tch th e next r e cord befor e doing any key comparisons with the current on e , hiding much of th e 
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memory lat e ncy and g e n e rally ov e rlapping computation and memory acc e ss so that hash 
s e arch e s can be don e as fast as th e records can b e fetched from th e SDRAM (PE Memory). 

Special Op e rations includ e various administrative functions that th e CE us e s; th e s e 
include functions such as incr e menting MCCONS and RCCONS in the RTU, flash clearing th e 
g e neral purpos e r e gisters and the r e sult vector, s e l e cting imm e diat e or index register addressing 
for PMEM, loading the PMEM ind e x point e r and s e tting or cl e aring its s e qu e ntial access mod e , 
managing a s e qu e ntial ind e x count e r for RESVEC used for tabl e comparisons and r e sult spills, 
halting the s e qu e nc e r or putting it into a power saving sle e p mod e , managing c e rtain special 
condition cod e s, e tc. 

Bulk Tabl e Comparisons (using th e cmprn instruction) implem e nt the CE's only multi 
cycl e instruction; prior to e xecuting cmprn, on e or two 32 bit comparison valu e s ar e loaded into 
general purpose r e gisters. In th e first cycl e of a cmprn instruction one or two g e n e ral purpose 
r e gist e rs ar e id e ntified as the A sid e and B sid e comparison values (both can b e th e sam e r e gist e r 
if d e sired), a starting index into RESVEC is s e t, four sp e cial condition cod e s associat e d with 
bulk tabl e comparisons ar e cl e ar e d, an instruction length count e r is initialized to th e instruction 
l e ngth "N", and th e e ntir e proc e ssor is s e t for cmprn mode. Th e n e xt "N" 6 4 bit microcod e 
words are int e rpr e t e d as pairs of 32 bit valu e s for comparison rath e r than as microcod e ; on e 32 
bit value is compar e d to th e A sid e register and th e oth e r is compar e d to th e B sid e r e gist e r, and 
if e ith e r match e s th e associat e d bit in th e ( e v e n,odd) bit pair point e d to by th e RESVEC_INDEX 
is s e t; th e n th e RESVEC INDEX in incr e m e nt e d to point at the n e xt bit pair, the length count e r 
is decrement e d, and th e n e xt comparison valu e pair is f e tch e d from CMEM. Th e process is 
r e peat e d until th e l e ngth counter r e ach e s 0. 
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Associat e d with this process ar e th e four condition code bits MATCH_A, MATCH_B, 
N1ATCH_A_QRJB, and MATCH A_ANDJB, which indicate that at least one table value 
match e d on th e A sid e , on the B sid e , on e ither sid e , or on A and B sid e tog e ther (as a 64 bit 
match), r e sp e ctiv e ly. 

Given this facility it is possibl e to compare on e e xtract e d valu e to (2 * N) constants or to 
compare two values to N constants e ach, in a total of (N+l) cycles. These bulk tabl e lookups ar e 
useful for rapidly searching small tabl e s as part of predicat e analysis; hash tabl e lookups are us e d 
for larg e r tabl e s when it becomes more tim e e ffici e nt to do so. 

Another special condition cod e is "Sticky z e ro" or "SZ". It is us e d to cumulativ e ly 
check status on a chain of equality comparisons of the form "if (A — X) and (B Y) and (C Z) 
and (D — W) th e n..." by first s e tting th e SZ bit in th e Condition Cod e R e gister using a sp e cial 
operation, th e n doing a series of e quality comparisons or other arithmetic functions, then doing a 
conditional test of SZ; the bit stays set as long as the r e sult of all int e rv e ning operations that set 
conditions cod e s have th e "data equals z e ro" status. Any "data not e qual to z e ro status" result in 
th e series will cause SZ to clear and to stay clear. 

A m e ssaging facility b e tw ee n th e CE and the PP is provided; th e CE can s e t any of 4 
status bits which caus e status to b e com e visibl e to th e PP (Messag e Out bits) and th e PP can s e t 
any of 4 status bits (Messag e In bits) which th e CE can t e st as condition codes. Th e s e bits can 
b e us e d for any m e ssaging purpose as assign e d by softwar e . 

Two other condition code bits are "RX_RING_RDY" and "RECLASSJRINGJIDY", 
which ar e us e d by th e RTU to indicate to th e CE that th e r e is a l e ast one buff e r point e r for it to 
proc e ss in th e two buffer pointer rings on which it is a consumer; one ring is the "RX Ring" and 
always carries pack e ts from the associat e d RX MAC to this CE, and th e oth e r is call e d th e 
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"Reclassification Ring" through which any party can sch e dule a packet to b e processed on this 
CE. 

In summary, th e Classification Engin e t e sts th e two ring status bits and the ^ message bits 
in a dispatch loop, and calls th e appropriate s e rvice routine when a condition is found to be 
active. (When no conditions ar e activ e th e dispatch loop s e ts th e CE into "sl e ep mod e " to r e duc e 
power consumption.) Th e ring s e rvic e routines f e tch a pack e t buff e r point e r from th e associated 
ring, f e tch som e or all of th e packet (only as much as the microcod e will n ee d to examin e , or all 
of th e packet if ch e cksums ar e to be validated on the payload), th e n starts with th e first protocol 
head e r and execut e s a seri e s of application sp e cific op e rations to e xtract fi e lds from the pack e t, 
id e ntify and proc e ss arbitrary protocol head e rs, do tabl e lookups via bulk comparisons or hash 
tabl e searches as dir e ct e d by th e application, do ch e cksum v e rifications as programmed, do 
boolean r e duction on int e rim results, e xtract and optionally comput e on arbitrary fi e lds in th e 
packet, and finally to writ e all r e sults to a data structur e in th e p e r pack e t r e sult ar e a that travels 
with th e packet in the pack e t buffer in SDRAM. Th e results writt e n includ e th e s e t of singl e bit 
predicat e analysis r e sults, hash s e arch r e sults (a point e r to th e record that matches the key 
e xtracted from this pack e t or a point e r to wher e a hash r e cord should b e inserted if one do e s not 
e xist and th e application wants to cr e at e one, for any numb e r of differ e nt tabl e s with different 
k e ys), plus any e xtract e d or comput e d values (such as ind e x point e rs to th e start of e ach lay e r of 
protocol head e r) d e sir e d by th e application. Microcod e can be loaded into CMEM by th e AP or 
PP, or by th e CE itself onc e it has be e n loaded with its initial microcod e . 

The following pag e s includ e a block diagram of th e CE, a tabl e id e ntifying th e various 
microcod e control bits, formats for th e microcod e , and tabl e s of r e l e vant valu e s. 
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h CE Block Diagram 

FIG. 13 shows a block diagram of the Classification e ngin e . 

iA Overvi e w of the Classification Engin e in Figur e 13. 

Th o Classification Engin e is a pipelined micros e qu e nc e r. A 61 bit 
micro word is f e tch e d from Control Stor e CMEM 1202 using an addr e ss suppli e d by regist e r PC 
1234, and is stored in th e instruction r e gister I REG 1216. This cycl e is r e f e rr e d to as th e F e tch 
cycl e 1302. 

The 61 bit microword in I R e g 1216 has 7 bits e ach d e dicat e d to e nabling th e r e tir e m e nt 
of a result by causing r e gist e rs to b e loaded. On e of th e se bits is r e serv e d for future 
e nhanc e m e nts, whil e 6 of th e m hav e specifi e d functions as describ e d in Tabl e 16. This group of 
signals ar e known as the writ e enabl e s WE[6:0]. The WE bits also hav e function sp e cific nam e s 
as shown in Table 1; BRANCHJEN, REG_WE, CC_WE, RESVEC_WE, PMEM_WE, and 
SPECOP_EN. 

BRANCHEN e nabl e s conditional program flow changes if a condition test is m e t. It 
controls units in the Addr e ss Gen e ration Unit 1230. 

REG WE enabl e s r e tir e m e nt of 32 bit r e sults in th e word oriented half of th e machin e to 
all of th e g e neral purpose regist e rs and sp e cial r e gist e rs listed in Tabl e 17. It also has side e ff e cts 
of increm e nting th e pm e m 1204 index counter PCNT 1222 or dequ e uing a word of data from 
DFIFOJR 1250 und e r certain circumstanc e s. 

CC_WE enables th e writing of th e arithm e tic r e sult bits in th e condition cod e r e gist e r. 

PMEMJWE enabl e s writes into packet memory PMEM 1201. 

RESVEC_WE enabl e s stores in th o bit orient e d r e sult v e ctor RESVEC 1208. 

SPECOPJEN enables special operations including writing to PCNT 1222, NCNT 1221, 

BDST_CNT 1226, and other functions listed in Table 22. 
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The pip e lin e is 3 stag e s d e ep as shown in FIG 14. Tho Fetch stage 1302 has b ee n 
described abov e . Th e D e cod e stag e 1304 tak es plac e from the output of I REG 1216 to the 
inputs of D REG 1212, PC 1234. and RESVEC 1208. The Execute stage 1306 tak e s plac e from 
the output of D REG 1212 to tho inputs of all g e n e ral purpose r e gist e rs and special purpos e 
r e gisters listed in table 17; ALUOUT can be writt e n to GPREG 1206, MEMADDR 1254, 
DFIFO W 1252, th e CTRLJFILL registers 1210, and the special registers in block 1270. Figure 
14 show^s in detail what occurs in each stag e of the pipelin e , and at what stage various typ e s of 
r e sults are retir e d. Pipelin e stall conditions suppr e ss all of th e WE bits so that th e same condition 
holds from once cycl e to th e next, until th e stall condition clears. Sinc e this stall condition 
aff e cts all microcod e controll e d chang e s of stat e in the CE, it is implicit in all subs e qu e nt 
discussion of op e ration of th e pip e lin e and the e ff e ct of stalls n ee ds no furth e r discussion. The 
causes of pipelin e stalls ar e described in subsequent s e ctions. 
-tr2 Program Flow Control 

Th e address gen e ration unit 1230 det e rmin e s what address will be us e d to fetch th e n e xt 
micro word from CMEM. Th e Program Counter (PC) 123 4 contains the addr e ss of th e current 
instruction b e ing f e tch e d. If BRANCHJEN is a '0' th e n the next value of PC is an increm e nt of 
th e curr e nt value; with no branch e s the microsequenc e r f e tches microwords s e qu e ntially from 
CMEM. Wh e n BRANCHJ5N is ass e rted a test of condition cod e s list e d in Table 21 is done as 
s e lected by bits CCSEL[4 :0] and inverted by FALSE, both fields described in Table 16. If the 
condition test returns a "1" th e n th e conditional branch will b e tak e n, oth e rwis e PC 1234 will b e 
loaded with tho increment of its current value. Tho bit REG is t e st e d; if it is , 0 t th e n tho address 
PC is add e d to the value of the bits BRANCH_ADDR[9:0] to generate tho branch valuo of PC; if 
it is T th e n th e addr e ss PC is added to th e value on bus REGB[9:0] to generat e the branch valu e . 
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The bus REGB carries the output of GPREG 1206 port DOl, which carries the valu e of th e 
g e n e ral purpose r e gister s e l e ct e d with bits RSRCB[2:0]. 

Next bit RET is tooted. If it is a T then PC is loaded with the output of tho microstack 
1232, and th e microstack's stack point e r is d e crem e nted by 1. The microstack 1232 is a Last in, 
First out LIFO structur e us e d to support micro subroutin e s, nest e d up to 8 d ee p. If RET was a '0' 
th e n PC is load e d with th e calculat e d branch valu e d e scrib e d abov e inst e ad, and CALL is 
examin e d. If CALL is a T th e n the microstack 1232 has its stack point e r incr e m e nted, and th e 
incr e m e nted value of th e pr e vious PC is writt e n into th e microstack using the n e w valu e of th e 
stack point e r. In this way th e addr e ss stor e d in th e microstack 1232 when a CALL is e x e cut e d is 
th e addr e ss of th e n e xt instruction that would hav e b ee n e xecut e d s e qu e ntially if th e branch had 
not succeed e d; thus wh e n calling a subroutin e it is th e addr e ss of the n e xt instruction to return to 
after e xecuting a RET to t e rminat e th e subroutin e . 

Sinc e all program flow control decisions ar e mad e in th e D e cod e stag e 1301, th e 
s e qu e ntial instruction which follows is alr e ady in the f e tch stag e and is always executed. This 
means that th e r e is always a 1 cycl e delay b e twe e n f e tching a succ e ssful BRANCHJEN 
instruction and its e ff e ct on PC. Th e instruction which follows a branch instruction, and is 
always execut e d r e gardl e ss of th e succ e ss or failur e of the branch, is call e d a d e lay slot 
instruction. A d e lay slot instruction may not hav e BRANCH_EN s e t. Th e return valu e stor e d in 
th e microstack 1232 after a successful CALL is th e addr e ss of the instruction following th e d e lay 
slot instruction of th e CALL. 

Th e microstack 1232 in th e pr e f e rr e d e mbodiment of th e inv e ntion consists of 8 r e gist e rs 
with a multiplex e r (mux) selecting one of th e m as th e microstack output. A singl e 3 bit counter 
is used as th e stack point e r; it is d e cod e d in such a way that the road address N is th e writ e 
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address (N+l) go that a r e ad and d e crem e nt or write and increment can b e e x e cuted in a singl e 
cycl e . Att e mpting to e xecute a CALL wh e n th e microstack alr e ady has 8 valid e ntries in it, or 
att e mpting to e x e cute a RET wh e n th e microstack has no valid e ntries in it, caus e s th e pip e lin e to 
halt and signal STACK_ERROR status to tho Policy Processor 244. 

CCSEL, FALSE, BRANCH_ADDR, RSRCB, REG, CALL, and RET ar c all defined in 
Table 16. 

4t3 32 bit operations 

The Classification Engin e has two distinct data domains; on e is ori e nt e d around 32 bit 
data, and th e oth e r is ori e nted around 1 bit bool e an data in RESVEC 1208 and th e Bit ALU 
1260. Ther e are a few plac e s wh e r e data is communicat e d between th e s e two domains. This 
s e ction describ e s th e 32 bit domain. 

Th e 32 bit domain c e nt e rs around s e l e cting th e A sid e and B sid e op e rands which ar e 
then fed into AIN and BIN of the ALU 1214. The output ALUOUT from ALU 1 2 1 4 is th e n 
writt e n back to on e of th e 32 bit d e stinations, and optionally th e arithm e tic condition cod e s ar e 
sot if CC_WE is T. Tho ALU 1214 is a 32 bit Arithmetic and Logic Unit which performs any of 
the arithmetic functions list e d in Tabl e 19 or any of th e logic functions listed in Tabl e 20 under 
control of th e bits ALUOP[5:0] d e fined in Tabl e 16. 

GPREG 1206 is a 32 bit g e n e ral purpos e r e gister fil e comprising 8 32 bit regist e rs. It 
has two read ports and on e writ e port. R e ad port DO0 has th e contents of th e r e gist e r s e l e ct e d by 
RSRCA[2:0], and r o ad port DOl has tho contents of th e r e gist e r selected by RSRCB[2:0]. The 
r e gister select e d by RDST[2:0] is written to with the value of ALU_OUT if RDST[3] is '0' and 
REG WE is T. In order to mak e n e wly generat e d r e gister values available in the subs e quent 
instruction, th e pip e lin e delay of writing into GPREG and reading out th e n e w valu e is squash e d 
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through uso of Bypass Multipl e xers 1221 and 1223, which are used to forward ALUJ3UT to 
busses REGA and REGB if RDST of the instruction in the e x e cut e stage matches RSRCA or 
RSRCB, resp e ctiv e ly, in th e instruction in th e d e cod e stag e , thus hiding th e pip e lin e d e lay. Th e 
A side op e rand is s e lected among th e A sid e sources list e d in Table 17 by multipl e x e r 1225. Th e 
select e d data is then sent into the split add mask and rotat e unit 1210. Bits[31:16] of th e data 
are add e d to bits[15:0] of the data in the adder 1218, and th e 17 bit result is concanat e d with 
zeros in bits [3 1:17] to cr e at e the split add result. Th e s e lect e d data is also s e nt to th e Mask Unit 
1212 where it is bitwiscd AND'cd with MASK[31:0] if MSK[1] is a T, or is passed through 
unmodified if MSK[1] is a '0'; th e result from MASK 1212 is sent through the ROTATE barrel 
shifter 1211 where the data is rotated right by the number of bits sp e cified in ROT[1 :0] in the 
microword. Finally, MSK[0] is us e d to sel e ct between th e split add result and the mask rotat e 
result in multipl e xer 1216, and th e result is pres e nt e d to D REG 1212 as the A sid e op e rand for 
th e e xecut e stag e 1306. Th e B sid e operand is s e l e cted among th e B sid e sources list e d in Table 
18 using multiplexer 1228, and is presented to the D REG 1212 as th e B side operand for the 
e x e cut e stag e 1306. 

RSRCA, RSRCB, ALUOP[5:0], RDST[3:0], MASK[31:0], MSK[1], MSK[0], ROT[1:0] 
are all described in Tabl e 16. 
4^1 PMEM 

Packet Memory (PMEM) 1201 is a (32 bit by 512 entry) RAM with one read port and 
on e writ e port used to hold s ome or all of th e pack e t being proc e ss e d, and also to hold arbitrary 
data g e nerated by th e program. PMEM 1204 can b e writt e n from two sources; DFIFO_R 1250, 
or the REGA bus from the general purpose registers GPREG 1206, whore th e r e gister is s e lected 
by RSRCA[2:0]; such writ e s occur wh e n PMEM__WE is a T in th e microword. PMEM is r e ad as 
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ono of th e A sid e sourc e s s e l e ctabl e as on e of th e "sp e cial r e gist e r" sourc e s. 

PMEM 1201 addr e ssing depends on the - stato bit USE_PCNT. Wh e n USE_PCNT is '0', 
then PMEM 1201 is addressed by PINDEX[10:2] from th e microword. When USE_PCNT is T 
then the address to PMEM 1201 is provided by the counter/register PCNT 1222. USE_PCNT is 
sot and cleared via special operations. When SPECOP_EN is '1' and LD_PCNT is '1', th e n 
PCNTREG is examined. If it is a "1 " then PCNT is loaded with the value of bits [10:2] of th e 
general purpose r e gist e r in GPREG 1206 selected by RSRCB[2:0]; alternativ e ly if PCNT REG 
is a "0" th e n PCNT is load e d with the value of PINDEX[10:2] in the microword. In either case 
the state bit USE_PCNT is sot. Additionally, bit PCNTJNC is examin e d, if it is a "1" then 
PCNT_INC_MODE is s e t, or if it is a "0" then PCNT_INC_MODE is cl e ar e d. Tho stato bit 
PCNT_INC_MODE d e t e rmines if PCNT 1222 holds a static value during th e PCNT_MODE 
p e riod, or if incr e m e nts by on e each tim e PMEM is writt e n to or is us e d as a r e gister sourc e . 
USE_PCNT clears wh e n an instruction has SPECOP_EN equal to "1" and UNLOCK_PCNT also 
equal to "1". 

DFIFOR, RSRCA[3:0], RSRCB[3:0], FINDEX[10:2] ore all defined in Table 16, 
LDPCNT, PCNT REG, PCNTJNC, UNLOCK_PCNT are all defined in Tabl e 22. 
4-rS Int e rface to Memory 260 

SDRAM Memory 260 con be r e ad and writt e n by th e microcod e . Th e m e mory int e rfac e 
visible to tho microcode consists of tho MEM_ADDR FIFO 1251, tho write data FIFO 
DFIFO_W 1252, and tho road data FIFO DFIFO R 1250. Writes to memory 260 are call e d 
stores, and r e ads from m e mory 260 or e call e d loads. Loads and stor e s can b e of siz e 1, 2, 1, or 8 
words of 32 bits e ach. The addr e ss of a m e mory acc e ss must b e siz e aligned for th e sp e cifi e d 
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burst; that is, th e address for a 2 word memory acc e ss must be on an 8 byt e boundary, th e 
addr e ss of an 8 word acc e ss must be on a 32 byt e boundary, e tc. 

To sch e dul e a store, pr e cis e ly th e number of words for th e specifi e d siz e of transf e r ar e 
written to th e sp e cial r e gister d e stination DFIFO_W 1252, then th e addr e ss (along with control 
information MEMJSIZE[1 :0] and MEM_DIR~STORE) arc written into the address fifo 
MEM_ADDR 1251, which trigg e rs th e m e mory int e rfac e to issue th e stor e . The microsequ e nc e r 
is d e coupl e d from the m e mory syst e m by the FIFOs 1252 and 125 4 , and thus can continu e 
op e ration while th e memory int e rface proc e sses th e store op e ration. Th e FIFOs 1251 and 1252 
can hold up to 8 addr e sses and 16 words of data, resp e ctiv e ly, so that in general mor e than one 
store op e ration can be outstanding without stalling the pipeline. Th e e ntir e pip e lin e stalls wh e n 
the execute stage 1306 operation is a write to e ither MEM_ADDR 1251 or to DFIFO_W 1252 
and th e targ e t FIFO does not hav e room for anoth e r word. The situation will cl e ar as the FIFO 
drains its curr e nt op e ration to memory 260 so th e stall condition is transi e nt. 

To sch e dul e a load, th e addr e ss (along with control information MEM_SIZE[1 :0] and 
MEM_DIR~LD) is writt e n to sp e cial register d e stination MEMADDR, and some tim e later th e 
microcod e can obtain th e r e qu e st e d data from the r e ad data FIFO DFIFOR 1250. B e tw ee n th e 
tim e that the micros e qu e nc e r sch e duled the load op e ration and th e tim e th e data is consum e d, 
there is lat e ncy to access the memory syst e m 260. Th e microcode can choos e to e x e cut e any 
numb e r of instructions b e tw ee n the tim e th e load is sch e dul e d in MEMADDR 125 4 and th e 
data is consumed from DFIFO_R 1250, sinc e th e loads ar e non blocking. How e v e r, if th e 
microcode att e mpts to r e ad data from DFIFOR 1250 and th e r e is no data available, the pipelin e 
will stall until such tim e as r e qu e st e d data has return e d from m e mory 260. Mor e than on e load 
can b e sch e dul e d b e for e any data is consum e d; DFIFOR 1250 has room for up to 16 



SFRLIB1\JKS\5091501.05 



Page 80 



double words (128 byt e s) of data, 

Th o microcode is r e sponsibl e for e nsuring that it n e v e r att e mpts to r e ad data from 
DFIFO_R 1250 wh e n no mor e words of r e ad data hav e b e en sch e dul e d, nor to issu e a stor e 
addr e ss to MEM_ADDR 1251 when DFIFO_W 1252 has not boon written with precisely tho 
number of words sp e cifi e d in th e siz e of th e stor e . Th e microcod e is also responsibl e for n e v e r 
ov e rsubscribing DFIFOR 1254, that is, sch e duling mor e outstanding words of read data than 
DFIFO_R 1254 has room for. Any of th e s e conditions is d e t e cted by error ch e cking logic in the 
CE which will halt th e CE and report violations to th e Policy Proc e ssor 2 4 4 if th e memory 
system is us e d incorrectly. 
4t€ Bit ori e nted operations 

RESVEC 1208 is a 1 bit by 512 e ntry r e gister fil e with sp e cial charact e ristics. It has on e 
writ e port and 3 read ports; this means that in any on e instruction 3 bits can b e read and on e write 
can be issu e d. Th e writ e can be to on e bit, or to an adjacent pair of bits whose address diff e rs 
only in th e l e ast significant bit, r e f e rr e d to h e r e as an e v e n odd bit pair. For certain op e rations 
RESVEC 1208 can also b e acc e ss e d as a 32 bit by 16 entry register fil e . 

When RESVEC_WE is a '1' and the microcode bit 2BIT is a '0' then a single bit in 
RESVEC 1208 is written with th e data present e d on th e DIN0 data input port; that data is 
sel e ct e d from among 4 diff e rent sourc e s und e r control of the RES0_SEL[1 :0] bits in th e 
micro word. Alternativ e ly if 2BIT is a T th e n the DIN0 data is writt e n to th e e v e n - numb e r e d bit 
in th e d e stination, and DIN1 select e d from among two sources by RES1_SEL is written to th e 
odd numb e r e d bit of the pair. 

The d e stination address in RESVEC 1208 comes either from RESJBIT_DST[9:0] if stat e 
bit USE_WCNT is '0', or from BDSTCNT 1226 if USE_WCNT is a T. USEJVCNT is set 
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when SPECOF_EN is ' 1 ' and LD_BDST_CNT io a ' 1' . In that cas e BDST_CNT 1 226 is written 
with the value RES_BIT_DST[9:1]. At the same time BDST_CNT 1226 is loaded, tho bit 
BDST_CNT_MODE in the micro word is examined. If it is '0' then BDST_CNT 1226 io set to 
increment by 2, if it is '1' then BDST CNT 1226 io configured to increment by 32. Tho former is 
used in th e sp e cial instruction CMPRN to sw ee p across s e qu e ntial bit pairs in each cycl e of th e 
instruction and to writ e to th e m, whil e th e latter is us e d for th e RESVEC 1208 read addr e ss port 
RAO to s e quentially read 32 bit groups of RESVEC 1208 bits as the B side special register 
RES_VEC. 

Tho bit ori e nted ALU 1260 contains two boolean logic units 1261 and 1268 and one gang 
operation unit 1262. Boolean logic unit 1261 tak e s th e two bits select e d by 
RES_BIT_SRC_A[9:0] and RES_BIT_SRC_B[9:0] and applies tho boolean operation 
BITOPAB[3:0] as specified in table 20. Tho 1 bit result RES_BIT0 is ono of tho pot e ntial 
sourc e s for write data port DIN0 on RESVEC 120 8 . Boolean logic unit 126 8 similarly tak e s th e 
op e rands selected by RES_BIT_SRC_A[9:0] and RES_BIT_SRC_C[9:0] and appli e s 
BITOPAC[3:0] in a substantially similar manner, generating tho 1 bit result RES_BIT1 which 
may b e s e l e ct e d as th e DIN1 writ e data sourc e if 2BIT is '1'. Thus in one cycle up to two bitwis e 
bool e an op e rations can b e e x e cuted if th e two op e rations hav e on e common operand. Th e 
GANGOP unit 1262 tokos tho 32 adjacent bits from RESVEC 1208 sel e cted by 
RES_BIT_SRC_A[9:5] and treats thorn as a word operand. MASK[3 1 :0] is usod to select which 
bits of that word will contribute to th o gang results, then on AND, OR, NAND, or NOR 
op e ration is p e rformed on all of th e s e l e ct e d bits as instruct e d in GANGOP[l :0], and the result 
bit RES_GANG is presented as on e of tho possibl e sources for DIN0 on RESVEC 120 8 . 

Tho condition code soloctod by CCSEL[1:0] and optionally inv e rted with FALSE can 
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also bo s e lected as th e data sourc e for port DINO. 

The remaining sources for DINO and DIN1 on RESVEC 1208 arc the CMPR_A, 
CMPRJB result bits from on e cycl e of a bulk comparison instruction CMPRN, d e scribed b e low. 

RESVEC 1208 addr e ss fi e lds for sourc e s and d e stination ar e sp e cifi e d as 10 bits, e v e n 
though only 9 bits ar e us e d in th e pr e f e rr e d e mbodim e nt; th e e xtra bit allows for a doubling of 
the siz e of RESVEC 1208 in future g e n e rations of th e d e vic e . 

Writes to RESVEC 1208 are retir e d at the end of the Decod e stage 1301 and can thus be 
used imm e diat e ly as an op e rand in th e subs e qu e nt instruction, without need for bypassing as is 
done with GPREG 1206. 

2BIT, RES0_SEL[1:0], RES1_SEL, BITOPAB, BITOPAC, GANGOF[1:0], 
RES_BIT_DST[9:0], RES_BIT_SRC_A[9:0], RES_BIT_SRC_B[9:0], RES_BiT_SRC_C[9:0], 
MASK[31:0], CCSEL[1:0], FALSE ar e all defined in Tabl e 16. 

LD BDST CNT, BDST CNT MODE arc sp e cifi e d in Table 22. 
4t7 Bulk comparisons 

When SPECOP_EN is '1' and LD_NCNT is also T, th e instruction cycle counter N_CNT 
1221 is loaded with the value NCNT[6;0] (bits[22:16] of th e microword) and the state bit 
CMPRN is sot. LD_BDST_CNT is required to also b e a '1' for this instruction, and 
BDST_CNT_MODE must bo a '0'. BDST_CNT 1226 is loaded with the valu e 
RES_BIT_DST[9:1]. GPREG 1206 is locked with the A sid e s e l e ct RSRCA[2:0] and th e B sid e 
select RSRCB[2:0], The bit CLEAR HIT is r e quired to be a '1' also in this instruction, which 
has the e ff e ct of s o tting the condition code register bits MTCH A, MTCHB, MTCH AORB, 
MTCH AANDB all to z e ro. 

For th e n e xt N cycl e s, until N_CNT 1221 has d e crem e nted to z e ro, int e rpr e tation of th e 
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61 bit microword is suppressed and all 61 bits ar e tr e at e d as data inst e ad. In e ach of th e s e cycl e s 
tho micro word bits [63:32] are compar e d to the s e l e cted A sid e r e gist e r valu e REGA using 
comparator 1220 to produc e th e result CMPR_A if thoy are equal; and microword bits [31 :0] are 
compar e d to the s e l e ct e d B sid e r e gist e r value REGB using comparator 1227 to produc e r e sult 
CMPR_B if thoy are equal. During CMPRN the RESVEC unit 1208 is locked into a mod e 
wh o re 2BIT is true and RES0_SEL and RESl^SEL sel e ct CMPR_A, CMPRJB respectively. 
Tho r e sults CMPR_Aand CMPR_B arc stored to the even odd pair of bits in RESVEC 1208 
select e d by BDST_CNT 1226, then BDSTJCNT 1226 is incr e mented, NCNT 1221 is 
d e cr e m e nt e d, and th e process r e peats until NCNT 1221 e quals z e ro. At that point th e stat e bits 
USEBDSTJCNT and CMPRN clear and th e pip e lin e goes back to normal operation wh e r e 
every microword is int e rpreted. 

During e very comparison cycl e of th e CMPRN instruction, if CMPR_A is a T th e n th e 
condition cod e bit MTCH_A will s e t and will stay s e t. Similarly if CMPRJ3 is a T during any 
of those cycles then bit MTCHJB will sot and will stay sot. If either CMPR_A or CMPRJB is 
true during any of thes e cycl e s th e n condition cod e bit MTCHAORB will s e t and will stay s e t. 
Finally, if CMPR_A and CMPRJB are both '1' during a CMPRN compare cycle, th e n 
MTCH_AANDB will set and will stay set to indicat e that a 61 bit match was e ncounter e d. 

By loading on e or two regist e rs in GPREG 1206 with comparison valu e s prior to 
e xecuting th e CMPRN instruction, a singl e valu e can bo compared to (2 * N) values in a tabl e , or 
two diff e r e nt valu e s can e ach b e compar e d to (N) values, in ((2*N) + 1) e x e cution cycles. 

RESJBIT_DST[9:0], RSRCA[3:0], RSRCB[3:0], 2BIT, RES0_SEL, RES1_SEL ar e 
specifi e d in Tabl e 16. 

LDNCNT, LDBDSTJCNT, CLEAR J-IIT aro spocificd in Tablo 22. 
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4-r8 Special Op e rations 

In addition to th e special op e rations m e ntioned so far, th e r e are other administrativ e 
functions which ar e e nabl e d with SPECOP_EN and decod e d from th e bits specifi e d in Tabl e 22. 
D e cod e of th e s e functions and any d e cod e nec e ssary for implementing the instruction s e t 
specified tak e plac e in th e d e cod e r block DCD 1272. 
±S CMEM Fills 

The microstore CMEM 1202 is fill e d either via a s e ri e s of PIO writ e acc e sses from th e 
Policy Processor 211 or Application Processor 302, or can be load e d by use of the CTRL_FILL 
unit 1210. Th e r e gisters in CTRL^FILL 1210 ar e loaded with an address in memory 260, an 
addr e ss in CMEM 1202, and a count of the number of instructions to b e loaded. With the CE 
pip e line halted, th e CTRL_FILL unit will e xecut e this transfer. 

The transf e r may be initiat e d by th e Policy Proc e ssor 211, th e Application Proc e ssor 302, 
or can b e initiat e d by microcod e running on th e CE, in which cas e the CTRL_FILL 1210 
registers appear as sp e cial register d e stinations as shown in Tabl e 17, and th e op e ration is 
trigg e red with an instruction which has SPECOP EN e qual to T, and HALT and 
DO_CMEM_FILL asserted. Aft e r th e transf e r compl e t e s, microcod e can then continu e 
e xecution, including th e n e wly downloaded cod e . Th e CE can only load and launch itself if 
microcod e to do so is already resid e nt in CMEM 1202 and if th e host has configur e d th e CE to 
allow it to do so. 

HALT and DO _CMEM_FILL are specifi e d in Table 22. 
2t CE Programming Languages 

CE programs can be written dir e ctly in binary; how e v e r for programm e r conv e nience a 
microass e mbly languag e uasm has b ee n develop e d which allows a microword to b e construct e d 
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by declaring fields and th e ir values in a symbolic form. Th e set of common microwords for the 
intend e d us e of the CE have also been d e scribed in a higher level CE Assembly Languag e call e d 
masm which allows th e programm e r to d e scribe operations in a regist e r transf e r format and to 
d e scrib e concurrent op e rations without having to worry about the d e tails of microcode control of 
th e underlying hardwar e . Both of th e s e languag e s can b e us e d by a programmer or can b e 
gen e rated automatically from a compil e r which translat e s CE programs from a high e r l e v e l 
languag e such as N e tBoost Classification Languag e (NCL). 

V. Microprogramming Guid e 

The 64 bit CE instruction word is raw microcod e ; som e bits e nable r e tirement of 
operations by writing to one or mor e units, and th e r e st ar e used to st ee r diff e r e nt data paths and 
to provid e control cod e s to various units in parallel D e pending on which r e sults ar e r e tir e d, th e 
fi e lds in th e microword hav e diff e r e nt m e aning. Th e r e ar e 7 different ways that th e microword is 
int e rpr e ted; e v e n though all st ee ring is really don e in parall e l, thes e 7 instruction formats show 
which s e ts of fi e lds can b e us e d without conflict. 

Ther e are 7 bits that ar e constant in all formats; th e se ar e th e bits that e nabl e stor e s into 
various units. These bits ar e ( REGJVE, RESVEC_WE, CCJVE, r e s e rved, PMEM_WE, 
BRANCH JEN, and SPECOP_EN } , which are assign e d in that ord e r to bits [63 :57] of the 
microword and ar e describ e d in Tabl e 16. Th e r e maining bits ar e assign e d to control points as 
shown in FIG 13 and ar e d e fin e d in th e following sections. 

As shown in Figur e 14, th e CE is impl e m e nt e d as a 3 stag e pip e lin e ; each instruction 
passes through th e three stag e s Fetch 1302, Decod e 1304, and Execut e 1306; at any tim e there 
ar e three diff e rent instructions b e ing proc e ssed. The figure shows what process e s occur in e ach 
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stago of the pip e lin e , and h e lps illustrate behavior of the pip e lin e shown in Figure 13. When th e 
pipeline stalls all thre e stages stall tog e th e r in lockst e p. 

Most word ori e nt e d op e rations pass on e operand through e ith e r th e mask/shift unit or th e 
split add unit and then all word ori e nt e d op e rations pass through th e Execut e- stage ALU b e fore 
b e ing r e tired. Any consum e r of a n e wly produced GPREG value actually rec e iv e s a forward e d 
copy of th e curr e nt ALU output via som e bypass logic so that th e r e is no d e lay b e tw ee n cr e ation 
of a result and us e of it in a subs e qu e nt op e ration. Similarly, us e of condition codes for 
BRANCH (conditional flow control) or BSET (setting a s e lect e d RESVEC bit to the result of a 
condition cod e t e st), or r e ads of CC REG (Condition Cod e R e gist e r) wh e n th e bits ar e b e ing 
updat e d requires bypassing. 

Other r e gisters ( e .g. BASE_REG) do not hav e forwarding so th e softwar e must d e lay on e 
clock aft e r writing th e m b e for e using th e result. 

h Microword Format Definitions 

-W MOV, ALU, and LOST operations 

REG_WE is set. 

Th e se instructions s e lect 1 or 2 sourc e s among GPREG and SPREG, do a mask/shift or 
split add of the A sid e op e rand, th e n pass them through th e ALU and stor e th e r e sult to an 
SPREG or GPREG Condition codes Z, N, V, SZ, and CY are optionally s e t by this operation if 
CCJWE is sot. 
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Tablo 9: MOV and ALU formato 
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0 




4 


4 


4 3 
4 6 


3 




3 
2 


WRITE ENABL 




K 




4- 


PINDEX[10:2] 


RSRCA[3 


ES[6:0] 


RDST[3:0] 


ROT[4:0] 






MASK or IMMED 


3 
4- 




0 
0 



Tablo 10: MOV and ALU formats with PMEM src 



Not e that with PMEM[immediate_ind e x] as a sourc e the ALU is bypassed ( e xc e pt for 
sign and zero d e t e ct); howev e r mask/rotate or split add ar e still availabl e . 



SFRLIB1\JKS\5091501 .05 



Page 88 



6 

3- 




& 
1 


& & 


2 4- 


& 4 
0 6 


4 4 

# 0 


3 3 
9 € 


3 


3- 
2 


WRITE ENABL 
ES[6:0] 


RDST[3:0] 


TLIg 

1 V I L_7 

K 


ROT[1:0] 


ALUOP[5:0] 


RSRCB[3:0] \ 


RSRCA[3 


(a) 


h 




IMMED ADDR[27:0] 


4- 


0 


2 
9 




2 
1 




0 
0 



Table 1 1 : LDST format 

(a) SIZE[1:0] 

(b) DIR 
4t3 BIT_OP 

Bitops and gangops have RESVEC__WE s e t. Th e s e instructions s e l e ct a bit 
RES_BIT_DST in RESVEC as a d e stination to which the RE SO result is written; and if 
(optionally) 2BIT is set, th e n RES_BIT_DST is treat e d as th e pointer to an adjacent pair of bits 
wh e r e th e first has an e v e n addr e ss and the s e cond has th e n e xt (odd) address. With 2BIT the 
odd bit is written with the RES1 r e sult. 

Dep e nding on the valu e of th e fi e ld RES0_SEL, th e RESO r e sult may com e from a 
boolean operation BITOPAB performed.on th e op e rands s e l e cted by RE S_B IT_S RC_ A and 
RES_BIT_SRC_B, or th e r e sult of a GANG op e ration p e rform e d on bits in th e group of 32 
RESVEC bits sel e cted by RESJ3IT_SRC_A[9:5] and further selected by the "1" bits in th e 32 
bit immediate MASK fi e ld, or the s e lected and optionally inv e rted condition cod e bit s e l e ct e d by 
CCSEL and FALSE, or the A side result of a bulk table comparison CMPR__A. 

If RES 1 is being writt e n to the odd bit of a pair, th e RES1 result is s e l e ct e d by 
RES1JSEL to b e eith e r the result of the arbitrary boolean operation BITOPAC performed on the 
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operands s e lected by RES_BIT_SRC_A and RES_BIT_SRC_C, or tho B side result of a bulk 



table comparison CMPR_B. 



6 




& 


5 4 


4 4 


4 
4 


4 




4 
4- 




2 




2 
2 


WRITE ENABLE 




a 


b 


6 




RES BIT SRC A[9:0] 






RES_BIT_DST[9:0] 








RES BIT SRC C[9:0] 










BITOPAB[3 


BITOP_AC[ 


RES_BIT_SRC_B[9:0] 






^ .VI 








4 


CC SELpI: 


4- 




2 
1 


2 
6 


2 
2 


2 
4- 


4- 

2 


4- 
4- 


4- 
0 


0 
9 


0 
S 


0 


0 
4 


© 


0 
0 



Table 12: BIT_OP Format 

(a) RES0_SEL[1:0] 

(b) 2Bff 

<e) RES1_SEL 

(d) FALSE (selects gender of CCMUX output; 0 = as is, I = invert e d) 

W GANG_OP 



6 










4 


4 


4 


4 


4 


4 


4 




2 




3 
















# 


4 




2 


4- 








2 


WRITE_ENABLE 




RES BIT DST[9:0] 




4- 


0 


0 


a 


RES BIT SRC 




































=A4^} 














MASK 
































0 


4- 






























0 
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Table 13: GANGJ3P Format 

(a) CANG_OP[1:0] 
±A Branch 

BRANCHEN is always sot in this format. Note that a register to register aluop can bo 
fold e d into the sam e instruction as long as ther e are no oth e r fi e ld conflicts. 



6 

3- 




& 
1 




3 3 
9 6 




2 


WRITE ENABL 
ES[6:0] 
























RSRC[3:0] 






h 


6 


4 




a 


CC SEL[4: 


RES_BIT_SRC_C[9:0] 






BRANCH ADDR[9:0] 


3 
4- 


3- 
0 


2 
9 




2 
1 


2 
6 




2 
2 


2 
4- 




4- 
2 




© 
9 




© 
© 



Tabl e 14: Branch Format 

(a) FALSE (oolocto gend e r of CCMUX output; 0 = as is, 1 = inverted) 

(b) CALL 

(c) RET 

(d) REG (s e l e cts GPREG ('1') or immediat e valu e ('0') for branch 
4t§ SPECOP 

Special Op e ration bits (which ar e all qualifi e d with SPECOP_EN) ar e defined in Section 
Tabl e 22 on pag e 99. The instructions cmprn, s e tpcnt[i], and s e t resvec ind e x also us e som e 
sp e cop fields. 
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6 

3- 




& 


6 




4 

? 


4 


4 


4 
4 


4 




3 
9 




3 


3 




3- 


WRITE ENABLES 




(a) 


b 


e 








RSRCB[3:0] 






RES BIT DST[9:0] or 
WCNT[9:0] 




RSRCA[3:0 


or PINDEX[10:2] 




i i i it 




it 




N CNT[6:0] 


MSG[3:0] ± 




i i i i 


i i 




it it 


± 


3- 
4- 




2- 
2 


4- 
6 




0 
9 



Table 15: SPECOP Format 

(a) RES0_SEL[1:0] (for CMPRN) 

<b) 2BIT (for CMPRN) 

(e) RES 1_SEL (for CMPRN) 

(*) Th e int e rpr e tation of th e s e bits is d e fin e d in Tabl e 22 of pag e 99. 

(?) Und e fin e d but r e s e rv e d for future sp e cial op e rations 
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h€ Control Fi e ld DefinitioiiG 

Tabl e 16: Control Fi e lds 



Signals 




Elite 




These are the fixed format signals which retire results (unless the 


[63:57] 


WE[6:0] 


pipeline is stalled); they are: 

[0] SPECOPEN: enables special ops as defined in 9.2,5, 






[l] BRANCH_EN: Enables a conditional program flow control 
operation 


[2] PMEM_WE: Enables stores into PMEM 
[3] reserved 

Ml rp WP* Fnnblp" -tnrp tn CC 7 CC CY CC CC V CC N 


T51 RFSVFC WF* Fnnble<? ntores to the result bit vector 


[6] REG WE~ Enables otoros of ALU OUT into the GPREG file if 


(RDST[3] ~ 0), or into SPREG's if (RDST[3] - 1). 




Selects a GPREG to drive out on DOUT0 (using [2:0]) and selects 




RSRCA[3:0] 




[35:32] 


Between Ur IVCVJ dllU JrivEU bOUTCcb On cfxC II1UX tU jiLI 1 lYUU dllii 




C^l^p+o n OPPFfl tn Hrivp nut nn HOtTTI fn^intr r°*(Yh nnH ^plprt^ 


HQ 361 




Kf^frn^^Ti tVmt nnH ^lPl?T-^f*T r, r^nrr*P r " r\n tVie* At t Tft inTuit mi l y 


[jy joj 




Selects which GPREG to enable the WE onto with [2:0] if [3] — 0; 


[56:53] 


RDST[3:0] 


and if [3] — 1, [2:0] is decoded to select which SPREG to write to. 




ROT[1:0] 


Steers the 32 bit barrel shifter 


[50:46] 


MSK[1] 


If [1] then masking is enabled; if [0] then pass thru 






Tf m ^pWt^ lVf A^K/ROTATF nntrmt if TOl ^pWt- 9PT TT ADD 






output, on ALUA input mux. 


AT TTOPT^ "II 


F 1 vl ^i^lfvt^ A T T T A inniit A T T T OT TT TTif* rfi^nn fnr thi^ i ° tn 






L l Aj^»eict;t3 i tljkji \ input us i l i lie leuaLfii lkjl inu u i\j 

enable a MOV from PMEM[index] with mask and rot; but we lose 


J.T1J 




ALUOP due to bit overlays, so we can't use the ALU in the same 
instruction. 

[00] selects ADDER output 
[01] selects LOGIC output 






[13:10] 


ALUOP[3:0] 


On LOGIC unit, these 1 bits are the mux inputs steered by the bit 
pairs. 






Selects CY IN to ADDER: 


[11 :10] 


ALUOP[1:0] 


[00] selects "0" 

[01] selects "1" (for subtracts) 

[lx] selects CC REG CY 




ALU0P[2] 


If T, inverts ADDER input on the A port. 


t™t 


ALU0Pr3] 


If T, inverts ADDER input on the B port. 




IMMEDIATE 


32 bit immediate value used on ALUB input path; if (RDST = 


f34^ 
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Signals 


Function 


Q if c 




MEM ADDR) thon only bito [27:0] are used 






32-bit immediate value used in MASK and GANG_OP units for bit 




MASK 


masking; AND'od with the input value 






Used to address words in PMEM for MOV operations and for 


[11:36] 


PINDEX[10:2] 


loading PCNT for 

sequential pmem operations. a.k.a. INDEX[8:0] 








[31 :30] 


MEM SIZE[1: 
01 


(00} 


: 1 word 

: 2 words (only aligned double word allowed) 




m 


: 4 words (aligned on a 16 byte boundary) 






m 


8 word burst (aligned on an 8 word (32 byte) boundary) 


alignment 


MEM DIR 


In LDST format, [1] is a store, [0] is a load from memory 


roni 


RES BIT SR 


Selects a bit of the 512 bit result vector; bit [9] is not connected, 


[11 :32] 




leaves room for future growth. Bits[8:5] select the word to port 




C A [9:0] 


W0[3 1:0] on the file. Bito[1:0] select the bit within the word to port 


BO 


RES BIT SR 

mm 


Same as above, but to word Wl and bit Bl. 


[31:22] 


RES BIT SR 


Same as above, but to word W2 and bit B2. 


[21: 12] 


G-G 

\m 






RES BIT DS 




[56:17] 


T-f^O] 


[9] is reserved for future growth. [8:5] are decoded to a row select, 
and [4 :0] are decoded to a column select for enabling the bit write. 




1 -* 

open cpy n 


Mux select for the DIN0 bit to RESVEC; 
[00]: CMPR A 
[01]: RES BITO 
[10]: RES GANG 

[ll]:COND CODE as selected by [FALSE,CC SEL[1:0]) 


[16:15] 


REST SEL 


Mux select for the DIN1 bit to RESVEC, used if 2BIT is set; 


f43} 




[0]: CMPR B 
[1]:RES BIT1 


2Bff 


Enables next neighbor write to odd numbered bits in RESVEC, for 
operations with two results (dbitop, cmprn) 


{44} 




These bits are selected by (BIT1, BITO) to provide arbitrary boolean 


(*4] 


BITOP AB[3: 


functions on the bito: (00) >[01], (01) >[1], [10] ^2], [11) >[3] 




GANG OPfl] 


Mux steering. '1'— AND, '0'— OR 


m 


GANG OP[0] 


Inverts result if 1 V to create NAND or NOR 


MOT 
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Signals 


Function 


"Pjf Ci 




If BRANCH condition passes, this is the signed relative branch offset 




BRANCH[9:0] 


in CMEM 


GALL 




{34} 




Loads a copy of (PC+1) into the microstack; timed so that the address 
saved is one past the branch delay slot, and bumps microstack pointer 


PUT 

Ivi-TT 


T"* <1 a A A A * A A * A * A A A I A Ijj 


L JU J 


Forces the contents of the microstack register into the PC reg and 
decrements the microstack pointer 


BRANCH RE 


If ' 1', branch to REG B output on a branch/call; if '()' branch to the 


L J 


G 


immediate value 


FALSE 


If ' \\ invert the output of the CC MUX 




CC SEL[1:0] 


Selects a condition code bit for a branch decision 




Special ops 


Defined in "SPECOP bit assignments" on page 16 





2r. Register S e lect Codes 

2t4 A sid e Op e rand s and D e stination Regist e rs 

Table 17: R e gister S e lect Cod e s for Destinations and for A side Sourc e s 



REG[2:0] 


REG[3] ~ 0, Src. or 

TVi. 


REG[3]-l,Dst. 


REG[3] ~ 1, Src. 


mm 


GPREGO (gO) 


NULL (discard) 


CC REG 


Qbm- 


GPREG1 (gl) 


BASE REG 


BASEREG 


ObOW 


GPREG2 (g2) 


DFIFO W 


DFIFO R 


QbOU 




MEM ADDR 


BASE_REG_MSK 




GPREG3 (g3) 






mm 


GPREG1 (g1) 




PMEM 




GPREG5 (gS) 


CEFADR 






GPREG6 (g6) 


CESTART 




QW-H- 


GPREG7 (g7) 


CECNT 





2t3 B side Operands 

Table 18: Register Select Codes for B s id e Sourc es 



REG[2:0] 



REG[3] ~ 0 



REG[3] ~ 1 
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OfeOOO 



GPREGO (gO) 



IMMEDIATE 



OfeOW- 



GPREG1 (gl) 



IMMED_ADDR[27:0] ([31:28] ar e 
OxO) 



0fe04O 



GPREG2 (g2) 



DURATION 



ObOH- 



GPREG3 (g3) 



MEM WAIT 



0WO0 



GPREG4 (g1) 



TIMER 



GPREG5 (g5) 



DL\G REG 



ObWO 



GPREG6 (g6) 



0W44- 



GPREG7 (g7) 



RESVEC [ 1 ] 



[1] Indir e ct addr e ssing of RESVEC: RESVEC acc e ss e s a word of th e r e sult vector point e d to by 
WCNT (which was load e d via a sp e cop) and th e n autoincr e ments th e ind e x. Aft e r th e RESVEC 
store to dfifo is compl e t e d a r e sv e c_ind e x_unlock must b e e x e cuted to e nabl e random acc e ss to 
RESVEC. 



3^ ALU and Logic Op e rations 

3t4 Add e r Op Codes 

Table 19: ALUOP Bit Spocifications for ADDER (ALUOP[1]~0) 



OPERATION 


ALUOP [3:0] 


<ALUop> Name 


A-+-B 


ObOOOO 


AD© 


A + B+ CY 


ObOOlO 


ADC 


A + B + 1 


ObOOOl 


AD INC 


A— B 


OblOOl 


CT TP 


A B ^-{A-^~B-^G¥) 


OblOlO 


SUBB 


A B 1 


Ob 1000 


SBDEC 


B — A 


ObOlOl 


SBR (Reverse) 


B A 1 


ObOlOO 


SBRDEC 


B-A--CY--ffl-+B+CY) 


ObOllO 


SBRB 
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£5 Logic Op and BITOP Codes 

Table 20: ALUQP Bit Spooificationo for LOGIC (ALUOP[1] - 1) 



OPERATION 




<ALUop> Name 




ALUOP[3:0] 




AN© 


Ob 1000 


AN© 


OR 


Obi 110 


OR 


XQR 


ObOllO 


XOR 


NAN© 


ObOlll 


NAN© 


NOR 


ObOOOl 


NOR 


XNOR 


OblOOl 


XNOR 


INVERT A 


ObOOll 


{NVA 


INVERT B 


ObOlOl 


©WB 


PASS A 


ObllOO 


PASSA 


PASS B 


OblOlO 


PASSB 


ZERO 


ObOOOO 


ZERO 


ONES 


Obi 111 


ONES 


A AND NOT B 


ObOlOO 


AANDNB 


B AND NOT A 


ObOOlO 


BANDNA 


B OR NOT A 


OblOll 


BORNA 


A OR NOT B 


ObllOl 


AORNB 



BITOP f s and 32 bit Logic op e rations us e the two op e rand bits as sel e cts into a MUX 
which select among 4 bits provid e d in th e instruction. The e ncoding for logic op e rations us e s th e 
valu e of e ach pair of op e rand bits (A3) to s e l e ct which bit of ALUOP[3:0] provid e s th e r e sult. 
Wh e n th e logic op e ration is p e rform e d on bit op e rands from RESVEC th e bits (bsrcb, bsrca) 
provide th e sam e s e lection of bits from th e BITOP field (that is, for bitopab w e us e (bl,b0) and 
for bitopac w e us e (b2,b0) as operands: 
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Operand (bl,bO) or (b2,b > 0) (orbitoof 
(opA,opB)) 



BITOP (or ALUOP) bit s e l e cted as th e r e sult 



BITOPAx[ 



BITOPAx[ 
2} 



BITOPAx[ 
44 



BITOR\x 



Condition Cod e S e l e cts 

Each of thes e valu e s can b e tested tru e or inv e rt e d bas e d on bit "F" in th e instruction. 
Tabl e 21: Condition Code MUX values 



CC SEL 


Bit 


Notes 


ObOOOOO 


TRUE 


For unconditional branch 


AUAnAn i 


fcHt- 




UDUUUUl 


Last saved carry (or a Bypass ot it n tne preceecnng insiruction 
had CC WE set) 


ObOOOlO 


z 


Last saved Zero (or a bypass of it) 






ObOOOll 


N 






Sign bit of last result (or a bypass of it) 


ObOOlOO 


¥ 






Signed overflow (CY A N) of last result (or a bypass of it) 


ObOOlOl 


G? 


C Y && Z (unsigned Greater Than) 






ObOOllO 


E? 


C Y (unsigned Less Than) 


ObOOlll 


GE 


CY || Z (unsigned Greater Than or Equal) 


ObOlOOO 


EE 


CY [I Z (unsigned Less Than or Equal) 






ObOlOOl 


SZ 






STICKY_Z, set via a SPECOR Each time CC^Z is written, this 
bit will clear if CC Z I is '0 f , otherwise it holds its previous value. 




ObOlOlO 








RX_RING 


RX Ring has at least one buffer for this CE 


ObOlOll 


RECLASS RI 






NG 


Reclassify Ring has at least one buffer for this CE 


ObOllOO 


PEND RD WA 






ff 


There is a read pending for which some data has not yet arrived in 
DFIFO R 


ObOllOl 


PEND WR 


DFIFO_W has at least one word in it 
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Oh01 110 


PEND ADDR 


ITXl^l'l i. LL/l-'lV UL IVUJI Ulll/ UUU1VJJ 111 1 k 


ObOllll 


RES BIT 


Selected bit of Result Vector ( using bit2 (port C)) 


Obi 0000 








MSG_IN_A 


These are the message bits from the PP or AP to the microcode 


Ob 10001 


MSG_IN_B 


indicating that an action is to be taken (CTRL fill, hash insert or 


OblOOlO 


MSG IN C 


delete, etcj. inese are assignee py sonware convention, iNoie inat 
when a BRANCH_cc is made on any of these bits the associated j 


CCREG bit will clear when the branch is taken. 


OblOOll 


MSG_IN_D 




OblOlOO 




Z && N (Signed greater than) 


OblOlOl 


CTT 
omr 


Z && N (Signed loss than) 


OblOllO 


QHU 






Z [I N (Signed greater than or equal) 


OblOlll 


CT p 






Z |1 N (Signed less than or equal) 


Ob 11 000 


PEND RD DA 


At least one word is available in DFIFOR 




TA 




ObllOOl 


MTCHAORB 








Any A or B side operand matched during a cmprn instruction 


ObllOlO 


MTCHA 








Any A side operand matched during a cmprn instruction 


ObllOll 


MTCH B 


Any B side operand matched during a cmprn instruction 








Obi 1100 


MTCH AAND 






B 


Any 64 bit A B pair operand matched during a cmprn instruction 



Sp e cial Op e ration Fields 

These bits arc enabled by SPECOP_EN. 

Tabl e 22: SPECOP bit assignm e nts 



Bit 


Name 


Description 


m 


unlock_pcnt 


Puts PCNT counter back into normal immediate P index mode 


w 




Puts RESVEC index counter back into normal immediate mode 


unlockjesvec_inde 






inc rx index 


Increments CE CONS pointer in this CE's RX ring 






Increments CE_CONS pointer in this CE's RECLASS ring 


inc_reclassify_inde 

X 






clearjiit 


Clears 

CCREG[MTCH A,MTCH B,MTCH_AORB,MTCH_AANDB] 


m 


clear duration 


Sets the DURATION counter to 0x0 


w 


resetgpreg 


Flash clear ofGPREG[7:0] 


m 


reset resvecO 


Flash clear of RESVEC[3 1 :0]. Allows preservation of up to 32 global 
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hit vnrinhleq while clearing the rent 




reset resvec 15 1 


Flash cloar of RESVEC[51 1 :32] 


I J 

m 


setsz 


L d 

Sets CC Rb(j[SZ] to 1 to start a chained equality compare 


nni 


do cmem fill 


Triggers a CMEM fill sequence 


n 11 


ncuv 


9pt^ r^ft TH \T Tl inH frpp7p° the CP nineline 




set_msg[3:0] 


Each bit sets one of the 1 MSG OUT bits m CE CSR 


{34} 


Id ncnt 


loads N counter for CMPRN instruction 


L x - J J 




loads BDST counter, sets RESVEC sequential mode (for CMPRN & 


ld_bdst_cnt 


resvec spills) 




bdst cnt mode 


'0' ~ count by 2 for CMPRN, T = count by 32 for resvec spill 






Writes either PINDEX[10:2] or REGB[10:2] into PCNT and sets 




ld_pcnt 


PCNT autoincrcmont mode per PCNT INC 




pcnt_reg 


With ld^pcnt, '0' = load with immediate, T ~ load from gpreg on 
B side 




pcntinc 


With ld_pcnt, T ~ pent auto increments, '0' ~ no increments 






Freezes pipeline, sets CECSR[SLEEP], puts CMEM in powder down 
mode. Sleep mode persists until any of 

CECSR[RX RI>JG,RECLASS,MSG_IN[D:A]] causes a walceup. 



&. Misc e llany 

6r4 Memory Scheduling Rul e s 

A m e mory acc e ss is sch e dul e d by writing th e address/size/dir e ction to th e MEM_ADDR 
sp e cial r e gister. Th e following rul e s apply to sch e duling of memory access e s; violation of any of 
th e s e rul e s will caus e the pipelin e to HALT with status of th e caus e of th e e rror in th e CE Control 
and Status Regist e r (CECSR). 

1) Th e r e must b e at least one int e rv e ning instruction b e tw ee n a LD and us e of th e 
r e sulting data if no oth e r r e ad data is outstanding. A load follow e d by imm e diat e consumption 
wh e n th e outstanding sch e dul e is , 0' will r e sult in a d e adlock. 

2) A maximum of 1 6 slots of r e ad data can be sch e dul e d. A slot is a 2 word e ntry in 
DFIFO_R. A LD or LD2 consum e s 1 slot, a LD 4 consumes 2 slots, and a LD8 consum e s 4 slots 
in DFIFOR. Th e appropriat e numb e r of slots must b e availabl e b e for e another (LD, LD2, LD4, 
LD8) is sch e duled. 
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3) A maximum of 32 outstanding words of r e ad data can b e sch e dul e d; data must b e 
consum e d to make room in DFIFO_R b e for e mor e can b e sch e dul e d. 

4) Pr e cis e ly th e corr e ct numb e r of words of writ e data must b e written to DFIFO_W prior 
to scheduling th e stor e of that size. 

&3 Register Writ e Us e Rul e s 

GPREG and RESVEC r e sults can safely b e acc e ss e d in the instruction aft e r th e data is 
writt e n to th e m. 

PCNT, WCNT, and NCNT are all load e d via us e of a specop. Th e y can saf e ly b e us e d 
immediat e ly in th e next instruction. 

Th e sp e cop unlock_pcnt tak e s e ffect imm e diately, so PMEM immediat e ind e x can saf e ly 
be us e d in th e n e xt instruction. Lik e wis e , specop unlock_resv e c_index tak e s e ff e ct immediately, 
and random acc e ss to RESVEC can b e used in th e next instruction. 

BASEREG has a one cycl e writ e us e d e lay rul e ; if it is written to in instruction A, it 
cannot b e us e d as a sourc e operand in instruction A+l. 

PMEM has a on e cycl e writ e us e d e lay rul e for any particular addr e ss. If addr e ss addr i s 
written to in instruction A, th e n addr may not b e read in instruction A+l; how e ver it is p e rfectly 
saf e to r e ad any other location in PMEM in cycl e A+l. 

Data written to sp e cial r e gist e r NULL may not b e read back because, w e ll, it's gon e , man. 
&3 PMEM Addressing 

Pack e t Memory PMEM can b e addr e ss e d by an imm e diate index provid e d in th e 
micro word, indir e ctly from th e PCNT r e gist e r, or indir e ctly with auto increment of PCNT. 
Imm e diat e ind e xing is th e standard mod e ; use of PCNT is initiat e d with th e Id_pcnt sp e cial 
op e ration, which also carri e s the mod e bit pcntjnc that can optionally b e ass e rt e d. This sp e cial 
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operation sots the state bits USE J>CNT and (optionally) PCNTJNC JvlODE. USE J>CNT is 
cleared by the special operation unlock_pcnt. 

PCNT can b e load e d from an imm e diat e valu e PINDEX provid e d in th e Idjcnt sp e cial 
op e ration, or from bits [10:2] of any GPREG sp e cifi e d in RSRCB if th e specop bit pcnt_r e g is 
set during the Id_pcnt. 
&A Microstack 

The microstack is writt e n and th e stack pointer is incr e m e nted ev e ry time a conditional 
CALL instruction succ e eds. It is read and the stack pointer is decrem e nted every tim e a 
conditional RET instruction succ e eds. Th e addr e ss written is th e address of th e instruction 
following the delay slot of th e call, sinc e the d e lay slot is always e xecut e d. Th e microstack holds 
up to 8 e ntries. Calling to a d e pth great e r than 8, or r e turning past th e valid numb e r of e ntries, 
caus e s a halt with a report of STACK_ERROR in the CECSR. 

VI. Programming Mod e l 
This section describes the programming model and set of abstractions employed when 
creating an application for th e N e tBoost a platform (i. e ., th e platform d e scrib e d in this pat e nt 
application) such as the sample NetBoost platform illustrated in FIGs. 1-4 . An application on 
the NetBoost platform is to be considered a service, provided within the network, that may 
require direct knowledge or manipulation of network packets or frames. The programming 
model provides for direct access to low-level frame data, plus a set of library functions capable 
of reassembling low-level frame data into higher-layer messages or packets. In addition, the 
library contains functions capable of performing protocol operations on network or transport- 
layer messages. 
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An application developed for the NetBoost platform receives link-layer frames from an 
attached network interface, matches the frames against some set of selection criteria, and 
determines their disposition. Frame processing takes place as a sequence of serialized processing 
steps. Each step includes a classification and action phase. During the classification phase, 
frame data is compared against application-specified matching criteria called rules. When a rule's 
matching criteria evaluates true, its action portion specifies the disposition of the frame. 
Execution of the action portion constitutes the action Phase. Only the actions of rules with true 
matching criteria are executed. 

Implementing an application for the NetBoost platform involves partitioning the 
application into two modules. Modules are a grouping of application code destined to execute in 
a particular portion of the NetBoost platform. There are two modules required: the application 
processor (AP) module, and the policy engine (PE) module. Application code in the AP module 
runs on the host processor, and is most appropriate for processing not requiring wire-speed 
access to network frames. Application code for the PE module comprises the set of classification 
rules written in the NetBoost Classification Language (NCL), and an accompanying set of 
compiled actions (C or C++ functions/objects). PE actions are able to manipulate network 
frames with minimal overhead, and are thus the appropriate mechanism for implementing fast 
and simple manipulation of frame data. The execution environment for PE action code is more 
restricted than that of AP code (no virtual memory or threads), but includes a library providing 
efficient implementation for common frame manipulation tasks (s ee Section VIII) . A message 
passing facility allows for communication between PE action code and the AP module. 
1. Application Structure 

FIG. i& 5 illustrates the NetBoost application structure. 
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Applications 1402 written for the NetBoost platform must be partitioned into the 
following modules and sub-modules, as illustrated in FIG. 44 5. 

• AP Module (- application processor (host) module) 1406 

• PE Module (- policy engine module) 1408 

• Classification rules - specified in NCL 

• Action implementation - object code provided by app developer 

The AP module 1406 executes in the programming environment of a standard operating 
system and has access to all PEs 1408 available on the system, plus the conventional APIs 
implemented in the host operating system. Thus, the AP module 1406 has the capability of 
performing both frame-level processing (in conjunction with the PE), or traditional network 
processing using a standard API. 

The PE 1408 module is subdivided into a set of classification rules and actions. 
Classification rules are specified in the NetBoost Classification Language (NCL) and are 
compiled on-the-fly by a fast incremental compiler provided by NetBoost. Actions are 
implemented as relocatable object code provided by the application developer. A dynamic 
linker/loader included with the NetBoost platform is capable of linking and loading the 
classification rules with the action implementations and loading these either into the host 
(software implementation) or hardware PE (hardware implementation) for execution. 

The specific division of functionality between AP and PE modules 1406 and 1408 in an 
application is left entirely up to the application designer. Preferably, the AP module 1406 should 
be used to implement initialization and control, user interaction, exception handling, and 
infrequent processing of frames requiring special attention. The PE module 1408 preferably 
should implement simple processing on frames (possibly including the reconstruction of higher- 
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layer messages) requiring extremely fast execution. PE action code runs in a run-to-completion 
real-time environment without memory protection, similar to an interrupt handler in most 
conventional operating systems. Thus, functions requiring lengthy processing times should be 
avoided, or executed in the AP module 1406. In addition, other functions may be loaded into the 
PE to support actions, asynchronous execution, timing, or other processing (such as 
upcalls/downcalls, below). All code loaded into the PE has access to the PE runtime 
environment, provided by the ASL. 

The upcall/downcall facility provides for communication between PE actions and AP 
functions. An application may use upcalls/downcalls for sharing information or signaling 
between the two modules. The programmer may use the facility to pass memory blocks, frame 
contents, or other messages constructed by applications in a manner similar to asynchronous 
remote procedure calls. 
2. Basic Building Blocks 

This section describes the C++ classes needed to develop an application for the NetBoost 
platform. Two fundamental classes are used to abstract the classification and handling of 
network frames: 

• ACE , representing classification and action steps 

• Target, representing possible frame destinations 

2.1 ACEs 

The ACE class (short for Action-Classification-Engine) abstracts a set of frame 
classification criteria and associated actions, upcall/downcall entrypoints, and targets. They are 
simplex: frame processing is uni-directional. An application may make use of cascaded ACEs to 
achieve serialization of frame processing. ACEs are local to an application. 
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ACEs provide an abstraction of the execution of classification rules, plus a container for 
holding the rules and actions. ACEs are instantiated on particular hardware resources either by 
direct control of the application or by the plumber application. 

An ACE 1500 is illustrated in FIG. 16 6 : 

The ACE is the abstraction of frame classification rules 1506 and associated actions 
1508, destinations for processed frames, and downcall/upcall entrypoints. An application may 
employ several ACEs, which are executed in a serial fashion, possibly on different hardware 
processors. 

Figure 46 6 illustrates an ACE with two targets 1502 and 1504. The targets represent 
possible destinations for frames and are described in the following section. 

Frames arrive at an ACE from either a network interface or from an ACE. The ACE 
classifies the frame according its rules. A rule is a combination of a predicate and action. A rule 
is said to be "true" or to "evaluate true" or to be a "matching rule" if its predicate portion 
evaluates true in the Boolean sense for the current frame being processed. The action portion of 
each matching rule indicates what processing should take place. 

The application programmer specifies rule predicates within an ACE using Boolean 
operators, packet header fields, constants, set membership queries, and other operations defined 
in the NetBoost Classification Language (NCL), a declarative language described below in 
Section VII . A set of rules (an NCL program) may be loaded or unloaded from an ACE 
dynamically under application control. In certain embodiments, the application developer 
implements actions in a conventional high level language. Special external declaration 
statements in NCL indicate the names of actions supplied by the application developer to be 
called as the action portion for matching rules. 
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Actions are function entry-points implemented according to the calling conventions of 
the C programming language (static member functions in C++ classes are also supported). The 
execution environment for actions includes a C and C++ runtime environment with restricted 
standard libraries appropriate to the PE execution environment. In addition to the C environment, 
the ASL library provides added functionality for developing network applications. The ASL 
provides support for handling many TCP/IP functions such as IP fragmentation and re-assembly, 
Network Address Translation (NAT), and TCP connection monitoring (including stream 
reconstruction). The ASL also provides support for encryption and basic system services (e.g. 
timers, memory management). 

During classification, rules are evaluated first-to-last. When a matching rule is 
encountered, its action executes and returns a value indicating whether it disposed o/the frame. 
Disposing of a frame corresponds to taking the final desired action on the frame for a single 
classification step (e.g. dropping it, queueing it, or delivering it to a target). If an action executes 
but does not dispose of the current frame, it returns a code indicating the frame should undergo 
further rule evaluations in the current classification step. If any action disposes of the frame, the 
classification phase terminates. If all rules are evaluated without a disposing action, the frame is 
delivered to the default target of the ACE. 
2.2 Targets 

Targets specify possible destinations for frames (an ACE or network interface). A target 
is said to be bound to either an ACE or network interface (in the outgoing direction), otherwise it 
is unbound. Frames delivered to unbound targets are dropped. Target bindings are manipulated 
by a plumbing application in accordance with the present invention. 
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FIG. 4-7 7 shows a cascade of ACEs. ACEs use targets as frame destinations. Targets 1 
and 2 (illustrated at 1602 and 1604) are bound to ACEs 1 and 2 (illustrated at 1610 and 1612), 
respectively. Target 3 (at 1606) is bound to a network interface (1620) in the outgoing direction. 
Processing occurs serially from left to right. Ovals indicate ACEs, hexagons indicate network 
interfaces. Outgoing arcs indicate bound targets. An ACE with multiple outgoing arcs indicates 
an ACE that performs a demultiplexing function: the set of outgoing arcs represent the set off all 
frame destinations in the ACE, across all actions. In this example, each ACE has a single 
destination (the default target). When several hardware resources are available for executing 
ACEs (e.g. in the case of the NetBoost hardware platform), ACEs may execute more efficiently 
(using pipelining). Note, however, that when one ACE has finished processing a frame, it is 
given to another ACE that may execute on the same hardware resource. 
3. Complex Configurations 

As described above, a single application may employ more than one ACE. Generally, 
processing bidirectional network data will require a minimum of two ACEs. Four ACEs may be 
a common configuration for a system providing two network interfaces and an application 
wishing to install ACEs at the input and output for each interface (e.g. in the NetBoost hardware 
environment with one PE). 

"Figure 4-g 8 illustrates an application employing six ACEs 1802, 1804, 1806, 1808, 1810 
and 1812. Shaded circles represent targets. Two directions of processing are depicted, as well as 
an ACE with more than one output arc and an ACE with more than one input arc. The arcs 
represent possible destinations for frames. 

An ACE depicted with more than one outgoing arc may represent the processing of a 
single frame, or in certain circumstances, the replication (copying) of a frame to be sent to more 



SFRLIB1\JKS\5091501.05 



Page 108 



than one downstream ACE simultaneously. Frame replication is used in implementing broadcast 
and multicast forwarding (e.g. in layer 2 bridging and IP multicast forwarding). The 
interconnection of targets to downstream objects is typically performed by the plumber 
application described in the next section. 
4. Software Architecture 

This section describes the major components comprising the NetBoost software 
implementation. The software architecture provides for the execution of several applications 
performing frame-layer processing of network data, and includes user-level, kernel-level, and 
embedded processor-level components (for the hardware platform). The software architecture is 
illustrated FIG. 4-9 9. 

The layers of software comprising the overall architecture are described bottom-up. The 
first layer is the NetBoost Policy Engine 2000 (PE). Each host system may be equipped with one 
or more PEs. In systems equipped with NetBoost hardware PEs, each PE will be equipped with 
several frame classifiers and a processor responsible for executing action code. For systems 
lacking the hardware PE, all PE functionality is implemented in software. The PE includes a set 
of C++ library functions comprising the Action Services Library (ASL) which may be used by 
action code in ACE rules to perform messaging, timer-driven event dispatch, network packet 
reassembly or other processing. 

The PE interacts with the host system via a device driver 2010 and ASL 2012 supplied by 
NetBoost. The device driver is responsible for supporting maintenance operations to NetBoost 
PE cards. In addition, this driver is responsible for making the network interfaces supplied on 
NetBoost PE cards available to the host system as standard network interfaces. Also, specialized 
kernel code is inserted into the host's protocol stack to intercept frames prior to receipt by the 
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host protocol stack (incoming) or transmission by conventional network interface cards 
(outgoing). 

The Resolver 2008 is a user-level process started at boot time responsible for managing 
the status of all applications using the NetBoost facilities. In addition, it includes the NCL 
compiler and PE linker/loader. The process responds to requests from applications to set up 
ACEs, bind targets, and perform other maintenance operations on the NetBoost hardware or 
software-emulated PE. 

The Application Library 2002 (having application 1, 2 & 3 shown at 2020, 2040, 2042) is 
a set of C++ classes providing the API to the NetBoost system. It allows for the creation and 
configuration of ACEs, binding of targets, passing of messages to/from the PE, and the 
maintenance of the name-to-object bindings for objects which exist in both the AP and PE 
modules. 

The plumber 2014 is a management application used to set up or modify the bindings of 
every ACE in the system (across all applications). It provides a network administrator the ability 
to specify the serial order of frame processing by binding ACE targets to subsequent ACEs. The 
plumber is built using a client/server architecture, allowing for both local and remote access to 
specify configuration control. All remote access is authenticated and encrypted. 

¥Ht Classification Language 

The NetBoost Classification Language (NCL) is a declarative high level language for 
defining packet filters. The language has six primary constructs: protocol definitions, predicates, 
sets, set searches, rules and external actions. Protocol definitions are organized in an object- 
oriented fashion and describe the position of protocol header fields in packets. Predicates are 
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Boolean functions on protocol header fields and other predicates. Rules consist of a 
predicate/action pair having a predicate portion and an action portion where an action is invoked 
if its corresponding predicate is true. Actions refer to procedure entrypoints implemented 
external to the language. 

Individual packets are classified according to the predicate portions of the NCL rules. 
More than one rule may be true for any single packet classification. The action portion of rules 
with true predicates are invoked in the order the rules have been specified. Any of these actions 
invoked may indicate that no further actions are to be invoked. NCL provides a number of 
operators to access packet fields and execute comparisons of those fields. In addition, it provides 
a set abstraction, which can be used to determine containment relationships between packets and 
groups of defined objects (e.g. determining if a particular packet belongs to some TCP/IP flow or 
set of flows), providing the ability to keep persistent state in the classification process between 
packets. 

Standard arithmetic, logical and bit- wise operators are supported and follow their 
equivalents in the C programming language. These operators provide operations on the fields of 
the protocols headers and result in scalar or Boolean values. An include directive allows for 
splitting NCL programs into several files. 
1. Names and Data Types 

The following definitions in NCL constants have names: protocols, predicates, fields, 
sets, searches on sets, and rules (defined later subsequent sections). A name is formed using any 
combination of alphanumeric characters and underscores except the first character must be an 
alphabetic character. Names are case sensitive. For example, 

s e t_t cp__udp 
IsIP 
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isIPv6 

set_udp_ports 

The above examples are all legal names. The following examples are all illegal names: 

6_byte_ip 

set_tcp+udp 

ip_src&dst 

The first is illegal because it starts with a numeric character; the other two are illegal because 
they contain operators. 

Protocol fields (see Section 6) are declared in byte-oriented units, and used in 
constructing protocols definitions. All values are big-endian. Fields specify the location and size 
of portions of a packet header. All offsets are relative to a particular protocol. In this way it is 
possible to specify a particular header field without knowing the absolute offset of the any 
particular protocol header. Mask and shift operations support the accessing of non-byte-sized 
header fields. For example, 

dst { ip[16:4] } 

ver { (ip[0:l] & OxfO) » 4 } 

In the first line, the 4-byte field dst is specified as being at byte offset 16 from the beginning of 
the IP protocol header. In the second example, the field ver is a half-byte sized field at the 
beginning of the IP header. 
2. Operators 

Arithmetic, logical and bit- wise binary operators are supported. Table 23 lists the 
arithmetic operators and grouping operator supported: 



Operator 


Description 


0 


Grouping operator 


+ 


Addition 




Subtraction 


« 


Logical left shift 


» 


Logical right shift 



Table 23: Arithmetic operators 
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The arithmetic operators result in scalar quantities, which are typically used for comparison. 
These operators may be used in field and predicate definitions. The shift operations do not 
support arithmetic shifts. The shift amount is a compile time constant. Multiplication, division 
and modulo operators are not supported. The addition and subtraction operations are not 
supported for fields greater than 4 bytes. 

Logical operators are supported that result in Boolean values. Table 24 provides the 
logical operators that are supported by the language. 



Operator Description 



&& 


Logical AND 


II 


Logical OR 




Not 


> 


Greater Ttian 


>= 


Greater Than or Equal To 


< 


Less Than 


<= 


Less Than or Equal To 




Equal To 


!= 


Not Equal 



Table 24: Logical operators 



Bit-wise operators are provided for masking and setting of bits. The operators supported are as 
follows: 



Operators Description 



& 


Bit-wise AND 


1 


Bit-wise OR 


A 


Bit-wise Exclusive OR 




Bit-wise One's Compliment 



Table 25: Bit-wise operators 

The precedence and the associativity of all the operators listed above are shown in Table 26. The 

precedence is listed in decreasing order. 



Precedence Operators Associativity 



High 


on 


Left to right 






Right to left 




+ - 


Left to right 




« » 


Left to right 
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<<=>>= 


Left to right 






Left to right 




& 


Left to right 




A 


Left to right 






Left to right 




&& 


Left to right 


Low 


II 


Left to right 



Table 26: Operator precedence 



3. Field Formats 

The language supports several standard formats, and also domain specific formats, for 
constants, including the dotted-quad form for IP version 4 addresses and colon-separated 
hexadecimal for Ethernet and IP version 6 addresses, in addition to conventional decimal and 
hexadecimal constants. Standard hexadecimal constants are defined as they are in the C 
language, with a leading Ox prefix. 

For data smaller than 4 bytes in length, unsigned extension to 4 bytes is performed 
automatically. A few examples are as shown below: 

0x11223344 Hexadecimal form 
101 . 230 . 135 . 45 Dot separated IP address form 
f f : 12 : 34 : 56 : 78 : 9a Colon separated MAC address form 

Table 27: Constant formats 

4. Comments 

C and C++ style comments are supported. One syntax supports multiple lines, the other 
supports comments terminating with a newline. The syntax for the first form follows the C 
language comment syntax using /* and */ to demark the start and end of a comment, 
respectively. The syntax for the second form follows the C++ comment syntax, using //to 
indicate the start of the comment. Such comments end at the end of the line. Nesting of 
comments is not allowed in the case of the first form. In the second case, everything is discarded 
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to the end of the line, so nesting of the second form is allowed. Comments can occur anywhere 
in the program. A few examples of comments are shown below, 



/* Comment in a single line */ 

// Second form of the comment: compiler ignores to end-of-line 
/* Comments across multiple line 

second line 
third line */ 

// Legal comment // still ignored to end-of-line 
/* First form // Second form, but OK 

*/ 



Diagram 1: Legal comments 



The examples above are all legal. The examples shown in Diagram 1 1 (below) are 



illegal. 



/ * space */ 

/ new- line 

* Testing */ 

/* Nesting /* Second level */ 
*/ 

/ / space 

/ new- line 
/ 

// /* Nesting 
*/ 



Diagram 2: Illegal comments 



The first comment is illegal because of the space between / and *, and the second one because 
of the new-line. The third is illegal because of nesting. The fourth is illegal because of the space 
between the V chars and the next one because of the new-line. The last one is illegal because 



the /* is ignored, causing the */ to be in error of nesting of the first form of the comment in the 



second form. 
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5. Constant Definitions and Include Directives 

The language provides user-definable symbolic constants. The syntax for the definition 
is the keyword #def ine, then the name followed by the constant. No spaces are allowed 
between # and define. The constant can be in any of the forms described in the next 
subsection of this patent application. The definition can start at the beginning of a line or any 
other location on a line as long as the preceding characters are either spaces or tabs. For example, 

#define TELNET_PORT_NUM 23 // Port number for telnet 

#define IP_ADDR 10.4.7.18 
#define MAC_ADDR csd.ee. f 0 . 34 . 74 . 93 

Diagram 3: Sample of constant definition usage 

The language provides the ability to include files within the compilation unit so that pre-existing 
code can be reused. The keyword # include is used, followed by the filename enclosed in 
double quotes. The # must start on a new-line, but may have spaces immediately preceding the 
keyword. No space are allowed between # and the include. The filename is any legal 
filename supported by the host. For example, 

#include "myproto . def " // Could be protocol definitions 

#include "stdrules .rul" // Some standard rules 

#include "newproto.def " /* New protocol definitions */ 

Diagram 4: Sample include directives 

6. Protocol Definitions 

NCL provides a convenient method for describing the relationship between multiple 
protocols and the header fields they contain. A protocol defines fields within a protocol header, 
intrinsics (built-in functions helpful in processing headers and fields), predicates (Boolean 
functions on fields and other predicates), and the demultiplexing method to high-layer protocols. 
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The keyword protocol identifies a protocol definition and its name. The name may later be 
referenced as a Boolean value which evaluates true if the protocol is activated (see 6.2). The 
declarations for fields, intrinsics and demultiplexing are contained in a protocol definition as 
illustrated below. 

6.1 Fields 

Fields within the protocol are declared by specifying a field name followed by the offset 
and field length in bytes. Offsets are always defined relative to a protocol. The base offset is 
specified by the protocol name, followed by colon separated offset and size enclosed in square 
brackets. This syntax is as shown below: 

field_name{ protocol_name[off set: size! } 

Fields may be defined using a combination of byte ranges within the protocol header and 
shift/mask or grouping operations. The field definitions act as access methods to the areas within 
in the protocol header or payload. For example, fields within a protocol named MyProto 
might be specified as follows: 

dest_addr { MyProto [6: 4] } 

bit_flags { (MyProto [10 : 2] & OxOffO) » 8 } 

In the first example, field dest__addr is declared as a field at offset 6 bytes from the start of the 
protocol MyProto and 4 bytes in size. In the second example, the field bit_f lags is a bit 
field because it crosses a byte boundary, two bytes are used in conjunction with a mask and right 
shift operation to get the field value. 

6.2 Intrinsics 

Intrinsics are functions listed in a protocol statement, but implemented internally. 
Compiler-provided intrinsics are declared in the protocol definition (for consistency) using the 
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keyword intrinsic followed by the intrinsic name. Intrinsics provide convenient or highly 
optimized functions that are not easily expressed using the standard language constructs. One 
such intrinsic is the IP checksum. Intrinsics may be declared within the scope of a protocol 
definition or outside, as in the following examples: 

protocol f oo { 

...field defs... 

intrinsic chksumvalid { } 

} 

intrinsic now 

Diagram 5: Sample intrinsic declarations 

The first example indicates chksumvalid intrinsic is associated with the protocol f oo. Thus, 
the expression f oo . chksumvalid could be used in the creation of predicates or expressions 
defined later. The second example indicates a global intrinsic called now that may be used 
anywhere within the program. Intrinsics can return Boolean and scalar values. 

In a protocol definition, predicates are used to define frequently used Boolean results 
from the fields within the protocol being defined. They are identified by the keyword predicate. 
Predicates are described in section 7. 
6.3 Demux 

The keyword demux in each protocol statement indicates how demultiplexing should be 
performed to higher-layer protocols. In effect, it indicates which subsequent protocol is 
"activated", as a function of fields and predicates defined within the current set of activated 
protocols. 

Evaluation of the Boolean expressions within a protocol demux statement determines 
which protocol is activated next. Within a demux statement, the first expression which evaluates 
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to true indicates that the associated protocol is to be activated at a specified offset relative to the 
first byte of the present protocol. The starting offset of the protocol to be activated is specified 
using the keyword at. A default protocol may be specified using the keyword default The first 
case of the demux to evaluate true indicates which protocol is activated next. All others are 
ignored. The syntax for the demux is as follows: 

demux { 

boolean_exp { pro toco l_name at offset } 
default { protocol_name at offset } 

} 

Diagram 6: Demux syntax sample 

Diagram 7 shows an example of the demux declaration. 

demux { 

(length = 10) { proto_a at offset_a } 

(flags && predicate_x) { proto_b at offset_b } 
default { proto_default at of f set_default } 

} 

Diagram 7: Sample protocol demux 

In the above example, protocol proto a is "activated" at offset of f set_a if the expression 
length equals ten. Protocol proto_b is activated at offset of f set_b if flags is true, 
predicate_x is true and length is not equal to 10. predicate_x is a pre-defined Boolean 
expression. The default protocol is proto_de fault, which is defined here so that packets not 
matching the predefined criteria can be processed. The fields and predicates in a protocol are 
accessed by specifying the protocol and the field or predicate separated by the dot operator. This 
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hierarchical naming model facilitates easy extension to new protocols. Consider the IP protocol 
example shown below. 



protocol ip { 

vers { (ip[0:l] & OxfO) » 4 > 

hlength { ip[0:l] & OxOf } 

hlengthjD { hlength « 2 } 

tos { ip[l:l] } 

length { ip[2:2] } 

id { ip[4:2] } 

flags { (ip[6:l] & OxeO) » 5 } 

fragoffset { ip[6:2] & Oxlfff } 

ttl { ip[8:l] } 

proto { ip[9:lj } 

chksum { ip[10:2] } 

src { ip[12:4] } 

dst { ip[16:4] } 

intrinsic chksumvalid { } 
intrinsic genchksum { } 

predicate beast { dst = 255.255.255.255 } 

predicate mcast { (dst & OxfOOOOOOO) = OxeOOOOOOO } 

predicate frag { fragoffset != 0 | | (flags & 2) != 0 } 

demux { 

( proto = 6 ) { tcp at hlength_b } 

( proto = 17 ) { udp at hlengthjb } 

( proto = 1 ) { icmp at hlength_b } 

( proto = 2 ) { igmp at hlength_b } 

default { unknown IP at hlengthjb } 

} 

} 



Diagram 8: Protocol Sample: IP 

Here, ip is the protocol name being defined. The protocol definition includes a number 
of fields which correspond to portions of the IP header comprising one or more bytes. The fields 
vers , hlength, flags and fragoffset have special operations that extract certain 
bits from the IP header, hlengthjb holds the length of the header in bytes computed using the 
hlength field (which is in units of 32-bit words). 

beast, mcast, and frag are predicates which may be useful in defining other rules or 
predicates. Predicates are defined in Section 7. 
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This protocol demuxes into four other protocols, excluding the default, under different 
conditions. In this example, the demultiplexing key is the protocol type specified by the value of 
the IP proto field. All the protocols are activated at offset hlength_b relative to the start of 
the IP header. 

When a protocol is activated due to the processing of a lower-layer demux statement, the 
activated protocol's name becomes a Boolean that evaluates true (it is otherwise false). Thus, if 
the IP protocol is activated, the expression ip will evaluate to a true Boolean expression. 
The fields and predicates in a protocol are accessed by specifying the protocol and the field, 
predicate or intrinsic separated by the dot operator. For example: 

ip . length 

ip . beast 

ip . chksumvalid 

Diagram 9: Sample references 

Users can provide additional declarations for new fields, predicates and demux cases by 
extending previously-defined protocol elements. Any name conflicts will be resolved by using 
the newest definitions. This allows user-provided definitions to override system-supplied 
definitions updates and migration. The syntax for extensions is the protocol name followed by 
the new element separated by the dot (.) operator. Following the name is the definition enclosed 
in delimiters as illustrated below: 

xx.newfield { xx[10:4] } 

predicate xx.newpred { xx[S:2] != 10 } 

xx. demux { 

(xx[6:2] = 5 ) { newproto at 20 > 

} 



Diagram 10: Sample protocol extension 
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In the first example, a new field called newfield is declared for the protocol xx. In the 
second, a new predicate called newpred is defined for the protocol xx. In the third example, a 
new higher-layer protocol newproto is declared as a demultiplexing for the protocol xx. The root 
of the protocol hierarchy is the reserved protocol frame, which refers to the received data from 
the link-layer. The redefinition of the protocol frame is not allowed for any protocol definitions, 
but new protocol demux opertions can be added to it. 
The intrinsics provided are listed in Table 28: 



Intrinsic Name Functionality 


ip.chksumvalid 


Check the validity of the ip header checksum, return boolean value 


tcp.chksumvalid 


Check the validity of the tap pseudo checksum, return boolean value 


udp.chksumvalid 


Check the validity of udp pseudo checksum, return boolean value 


Tabled 


!8: List of intrinsics 



7. Predicates 

Predicates are named Boolean expressions that use protocol header fields, other Boolean 
expressions, and previously-defined predicates as operands. The syntax for predicates is as 
follows: 

predicate predicate_name { boolean_expression } 

For example, 

predicate isTcpSyn { tcp && (tcp. flags & 0x02) != 0 } 
predicate isNewTelnet { isTcpSyn && (tcp.dport = 23) } 

In the second example, the predicate isTcpSyn is used in the expression to evaluate the 
predicate isNewTelnet. 

8. Sets 

The language supports the notion of sets and named searches on sets, which can be used 
to efficiently check whether a packet should be considered a member of some application- 
defined equivalence class. Using sets, classification rules requiring persistent state may be 

sfrlibiuks\509150i.05 Page 122 



constructed. The classification language only supports the evaluation of set membership; 
modification to the contents of the sets are handled exclusively by actions in conjunction with 
the ASL. A named search defines a particular search on a set and its name may be used as a 
Boolean variable in subsequent Boolean expressions. Named searches are used to tie 
precomputed lookup results calculated in the classification phase to actions executing in the 
action phase. 

A set is defined using the keyword set followed by an identifier specifying the name of 
the set. The number of keys for any search on the set is specified following the name, between < 
and >. A set definition may optionally include a hint as to the expected number of members of 
the set, specified using the keyword size_hint. The syntax is as follows: 

set set_name < nkeys > { 

size_hint { expected population } 

> 

Diagram 11: Declaring a set 

The size_hint does not place a strict limit on the population of the set, but as the set size 
grows beyond the hint value, the search time may slowly increase. 

Predicates and rules may perform named searches (see the following section for a 
discussion of rules). Named searches are specified using the keyword search followed by the 
search name and search keys. The search name consists of two parts: the name of the set to 
search, and the name of the search being defined. The keys may refer to arbitrary expressions, 
but typically refer to fields in protocols. The number of keys defined in the named search must 
match the number of keys defined for the set. The named search may be used in subsequent 
predicates as a Boolean value, where "true" indicates a record is present in the associated set with 
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the specified keys. An optional Boolean expression may be included in a named search using the 
requires keyword. If the Boolean expression fails to evaluate true, the search result is always 
"false". The syntax for named searches is as follows: 

search set_name. search_name (keyl , key2) { 
requires { boolean_expression } 

) 

Diagram 12: Named search 

Consider the following example defining a set of transport-layer protocol ports (top or udp): 

#define MAX__TCP_UDP_PORTS_SE T_S Z 200 
/* TUPORTS: a set of TCP or UDP ports */ 
set tuports<l> { 

size_hint { MAX_JTCPJJDP_PORTS_SET_S Z } 

} 

search tuports . tcpjsport (top. sport) 
search tuports . tcp_dport (tcp.dport) 
search tuports . udp_sport (udp. sport) 
search tuports . udp_dport (udp.dport) 

Diagram 13: Sharing a set definition 

This example illustrates how one set may be used by multiple searches. The set tuports 
might contain a collection of port numbers of interest for either protocol, TCP/IP or UDP/IP. 
The four named searches provide checks as to whether different TCP or UDP source or 
destination port numbers are present in the set. The results of named searches may be used as 
Boolean values in expressions, as illustrated below: 

predicate tcp_sport_in { tuports . tcp_sport } 

predicate tcp_port_in { tuports . tcp_sport && tuports . tcp_dport } 
predicate udp_sdpor ts__in { 

tuports . udp_sport | | tuports . udp_dport 

} 



Diagram 14: Using shared sets 
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In the first example, a predicate tcp_sport_in is defined to be the Boolean result of 
the named search tuport s . tcp__sport, which determines whether or not the t cp . sport 
field (source port) of a TCP segment is in the set tuports. In the second example, both the 
source and destination ports of the TCP protocol header are searched using named searches. In 
the third case, membership of either the source or destination ports of a UDP datagram in the set 
is determined. 

9. Rules and Actions 

Rules are a named combination of a predicate and action. They are defined using the 
keyword rule. The predicate portion is a Boolean expression consisting of any combination of 
individual Boolean subexpressions or other predicate names. The Boolean value of a predicate 
name corresponds to the Boolean value of its associated predicate portion. The action portion 
specifies the name of the action which is to be invoked when the predicate portion evaluates 
"true" for the current frame. Actions are implemented external to the classifier and supplied by 
application developers. Arguments can be specified for the action function and may include 
predicates, named searches on sets, or results of intrinsic functions. The following illustrates the 
syntax: 

rule rule_name { predicate } { 

external_actlon_func (argl, arg2, ...) 

) 

Diagram 15: Rule syntax 

The argument list defines the values passed to the action code executed externally to NCL. An 
arbitrary number of arguments are supported. 
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set set_ip_tcp_ports <3> { 
size_hint { 100 } 

} 

set set__ip_udp_ports <3> { 
size_hint { 100 } 

} 

search set_ip_tcp_ports . tcp_dport ( ip.src, ip.dst, tcp.dport ) { 
requires {ip && tcp} 

} 

search set_ip_udp_j>orts . udp_dport ( ip.src, ip.dst, udp.dport ) { 
requires {ip && udp} 

} 

predicate ipValid { ip && ip . chksuinvalid && (ip.hlen > 5) && 
(ip.ver = 4) } 

predicate newtelnet { (tcp. flags & 0x02) && (tcp.dport = 23) } 
predicate tftp { (udp.dport = 21) && set_ip_udp_ports . udp_por ts } 

rule telnetNewCon { ipValid && newtelnet && set_ip_tcp_ports . tcp_dport } 
{ start_telnet ( set_ip_tcp_ports . tcp_dport) } 

rule tftppkt {ipValid && tftp } 

{ is_tftp_pkt ( udp.dport ) } 

rule addnewtelnet { newtelnet } 

{ add_to_tcp_pkt_count () } 



Diagram 16: Telnet/FTP example 



In the above example, two sets are defined. One contains source and destination IP 
addresses, plus TCP ports. The other set contains IP addresses and UDP ports. Two named 
searches are defined. The first search uses the IP source and destination addresses and the TCP 
destination port number as keys. The second search uses the IP source and destination addresses 
and UDP destination port as keys. The predicate ipValid checks to make sure the packet is an 
IP packet with valid checksum, has a header of acceptable size, and is IP version 4. The predicate 
newtelnet determines if the current TCP segment is a SYN packet destined for a telnet port. 
The predicate tftp determines if the UDP destination port corresponds to the TFTP port 
number and the combination of IP source and destination addresses and destination UDP port 
number is in the set ip_udp_port s. The rule telnetNewCon determines if the current 
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segment is a new telnet connection, and specifies that the associated external function 
start_telnet will be invoked when this rule is true. The function takes the search result as 
argument. The rule tf tppkt checks whether the packet belongs to a TFTP association. If so, 
the associated action is_tf tp_pkt will be invoked with udp. dport as the argument. The 
third checks if the current segment is a new telnet connection and defines the associated action 
function add_to_tcp_pkt_count. 
10. With Clauses 

A with clause is a special directive providing for conditional execution of a group of rules or 
predicates. The syntax is as follows: 

with boolean_expression { 

predicate pred_name { any_boolean_exp } 

rule rule_name { any_boolean_exp } { action__reference } 

) 

Diagram 17: With clause syntax sample 

If the Boolean expression in the with clause evaluates false, all the enclosed predicates and 
rules evaluate false. For example, if we want to evaluate the validity of an IP datagram and use it in a 
set of predicates and rules, these can be encapsulated using the with clause and a conditional, which 
could be the checksum of the IP header. Nested with clauses are allowed, as illustrated in the 
following example: 

predicate tcpValid { tcp && tcp . chksumalid } 

#define TELNET 23 // port number for telnet 
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with ipValid { 



predicate tftp { (udp.dport = 21) && 

ip_udp_ports . udp_dport } 
with tcpValid { /* Nested with */ 

predicate newtelnet { (tcp. flags & 0x02) && 

tcp.dport = TELNET } 
rule telnetNewCon { newtelnet && ip_tcp_j>orts . tcpjdport } 
{ start_telnet ( ip_tcp_sport. tcp_dport) } 

} 

rule tftppkt { tftp && ip_udp_jports . udpjdport } 
{ is_tftp_pkt ( udp.dport ) } 



Diagram 18: Nested with clauses 
11. Protocol Definitions for TCP/IP 

The following NCL definitions are used for processing of TCP/IP and related protocols. 

/***************************** frame (base unit) *****************************/ 
protocol frame { 

// status words written by NetBoost Ethernet MACs 

rxstatus { frame[0x 180:4] } // receive status 

rxstamp { frame[0xl84:4] } // receive time stamp 

txstatus { frame[0xl88:4] } // xmit status (if sent out) 

txstamp { frame[0xl 8C:4] } // xmit time stamp (if sent) 

predicate rxerror { (rxstatus & 0x80000000) } 

length { (rxstatus & 0x07FF0000) » 16 } // frame len 

source { (rxstatus & 0x00000F00) » 8 } // hardware origin 

offset { (rxstatus & OxOOOOOOFF) } // start of frame 

predicate txok { (txstatus & 0x80000000) != 0 } // tx success 

demux { 

rxerror { frame_bad at 0 } 

// source 0: NetBoost onboard MAC A ethernet packet 
// source 1 : NetBoost onboard MAC B ethernet packet 
// source 2: Other rxstatus-encodable ethernet packet 
(source < 3) { ether at 0x1 80 + offset } 



} 

} 



default { frame_bad at 0 } 



protocol frame_bad { 
} 
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*************************** ETHERNET **********************************/ 



#define ETHER_IPTYPEOx0800 
#define ETHERARPTYPE 0x0806 
#define ETHERRARPTYPE 0x8035 

protocol ether { 

dst { ether[0:6] } // source ethernet address 

src { ether[6:6] } // destination ethernet address 

typelen { ether[12:2] } // length or type, depends on encap 

snap { ether[14:6] } // SNAP code if present 

type { ether[20:2] } // type for 8023 encaps 

// We are only interested in a specific subset of the possible 
// 802.3 encapsulations; specifically, those where the 802.2 LLC area 
// contains DSAP=0xAA, SSAP=0xAA, and CNTL=0x03; followed by 
// the 802.2 SNAP ar3ea contains the ORG code 0x000000. In this 
// case, the 7802.2 SNAP "type" field contains one of our ETHER 
// type values defined above. 

predicate issnap { (typelen <= 1500) && (snap = OxAAAA03 000000) } 
offset { 14 + (issnap « 3) } 

demux { 

typelen = ETHER ARPTYPE { arp at offset } 

typelen = ETHER RARPTYPE { arp at offset } 

typelen = ETHERIPTYPE { ip at offset } 

issnap && (type = ETHERRARPTYPE) { arp at offset } 
issnap && (type = ETHER RARPTYPE) { arp at offset } 
issnap && (type = ETHER IPTYPE) { ip at offset } 



} 



} 



default { ether^bad at 0 } 



protocol ether_bad { 
} 

/****************** PROTOCOL ************************/ 

#define ARPHRD ETHER 1 /* ethernet hardware format */ 

#define ARPHRD FRELAY 1 5 /* frame relay hardware format */ 

#define ARPOPJREQUEST 1 /* request to resolve address */ 

#define ARPOP_REPLY 2 /* response to previous request */ 

#defme ARPOP REVREQUEST 3 /* request protocol address given hardware */ 

#define ARPOP REVREPLY 4 /* response giving protocol address */ 

#deflne ARPOP_INVREQUEST 8 /* request to identify peer */ 

#define ARPOPJNVREPLY 9 /* response identifying peer */ 

protocol arp { 

htype { arp[0:2] } 
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ptype { arp[2:2] } 

hsize {arp[4:l]} 

psize { arp[5:l] } 
op { arp[6:2] } 

varhdr { 8 } 



predicate ethip4 



{ (op <= ARPOPREVREPLY) && (htype = ARPHRD ETHER) && 
(ptype = ETHERJPTYPE) && (hsize = 6) && (psize = 4) } 



demux { 

ethip4 
default 

} 

} 

protocol unimpl arp { 

} 



{ ether_ip4_arp at varhdr } 
{ unimpl_arp at 0 } 



protocol ether_ip4_arp { 
shaddr 
spaddr 
thaddr 
tpaddr 

} 



{ ether_ip4_arp[0:6] } 
{ ether_ip4_arp[6:4] } 
{ ether_ip4_arp[10:6] } 
{ ether_ip4_arp[16:4] } 



/************************ jp v 4 ************************/ 



protocol ip { 
verhl 



tos 

length 

id 

ffo 



ttl 

proto 
cksum 
src 
dst 



{ ip[0:l] } 

ver { (verhl & OxfO) » 4 } 
hi { (verhl & OxOf) } 
hlen { hi « 2 } 

{ip[l:l]} 
{ ip[2:2] } 

{ ip[4:2] } 

{ ip[6:2] } 
flags { (ffo & OxeOOO) » 13 } 
fragoff {(ffo&Oxlfff)} 

{ ip[8:l] } 
{ ip[9:l] } 

{ ip[10:2] } 

{ ip[12:4] } 

{ ip[16:4] } 



// varible length options start at offset 20 



predicate 
predicate 
predicate 
predicate 

predicate 
predicate 
predicate 



dbcast { dst = 255.255.255.255 } 

sbcast { src = 255.255.255.255 } 

smcast { (src & OxFOOOOOOO) = OxEOOOOOOO } 

dmcast { (dst & OxFOOOOOOO) = OxEOOOOOOO } 



dontfr { (flags & 2) != 0 } 
morefr { (flags & 1) != 0 } 
isfrag { morefr || fragoff } 



// "do not fragment this packet" 
// "not last frag in datagram" 



predicate options { hlen > 20 } 
intrinsic chksumvalid { } 
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predicate 
predicate 



predicate 



okhwlen { (frame.length - ether.offset) >= length } 

invalid { (ver != 4) || (hlen < 20) || 

((frame.length - ether.offset) < length) || 
(length < hlen) || Ichksum valid } 

badsrc { sbcast || smcast } 



demux { 

// Demux expressions are evaluated in order, and the 
// first one that matches causes a demux to the protocol; 
// once one matches, no further checks are made, so the 
// cases do not have to be precisely mutually exclusive. 

invalid { ip_bad at 0 } 

badsrc { ipbadsrc at 0 } 

(proto = 1) { icmp at hlen } 

(proto ==2) { igmp at hlen } 

(proto = 6) { tcp at hlen } 

(proto =17) { udp at hlen } 

default { ip_unknown_transport at hlen } 

} 

} 

protocol ip bad { 
} 

protocol ip badsrc { 
} 

protocol ip_unknown_transport { 

} 

/****************** ********* udp ********************************/ 



protocol udp { 
sport 
dport 
length 
cksum 
intrinsic 
predicate 

} 



{ udp[0:2] } 
{ udp[2:2] } 
{udp[4:2]} 

{ udp[6:2] } 
chksumvalid { } /* undefined if a frag */ 
valid { ip.isfrag || chksumvalid } 



/* ************************* XCP *******************************/ 



protocol tcp { 
sport 
dport 
seq 
ack 
hlf 



win 

cksum 

urp 



{tcp[0:2]} 
{tcp[2:2]} 

{ tcp[4:4] } 

{tcp[8:4]} 

{tcp[12:2]} 
hi { (hlf & OxfOOO) » 12 } 
hlen { hi « 2 } 
flags { (hlf & 0x003f) } 

{ tcp[14:2] } 

{ tcp[16:2] } 

{tcp[18:2]} 
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intrinsic chksum valid { } /* undefined if IP Fragment */ 

predicate valid { ip.isfrag || ((hlen >- 20) && chksumvalid) } 

predicate optjresent { hlen > 20 } 

} 

/************************** JC]yfp ***********************************/ 

protocol icmp { 

type {icmp[0:l]} 
code {icmp[l:l]} 
cksum { icmp[2:2] } 

} 

IGMP ***********************************/ 

protocol igmp { 

vertype { igmp[0:l] } 

ver { (vertype & OxfO) » 4 } 

type { (vertype & OxOf) } 
reserved { igmp[l:l] } 

cksum { igmp[2:2] } 

group { igmp[4:4] } 

} 



VHt ASL 

The Application Services Library (ASL) provides a set of library functions available to 
action code that are useful for packet processing. The complete environment available to action 
code includes: the ASL; a restricted C/C++ library and runtime environment; one or more 
domain specific extensions such as TCP/IP. 
The Restricted C/C++ Libraries And Runtime Environment 

Action code may be implemented in either the ANSI C or C++ programming languages. 
A library supporting most of the functions defined in the ANSI C and C++ libraries is provided. 
These libraries are customized for the NetBoost PE hardware environment, and as such differ 
slightly from their equivalents in a standard host operating system. Most notably, file operations 
are restricted to the standard error and output streams (which are mapped into upcalls). 
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In addition to the C and C++ libraries available to action code, NetBoost supplies a 
specialized C and C++ runtime initialization object module which sets up the C and C++ run- 
time environments by initializing the set of environment variables and, in the case of C++, 
executing constructors for static objects. 
1. ASL Functions 

The ASL contains class definitions of potential use to any action code executing in the 
PE. It includes memory allocation, management of API objects (ACEs, targets), upcall/downcall 
support, set manipulation, timers, and a namespace support facility. The components 
comprising the ASL library are as follows: 
Basic Scalar Types 

The library contains basic type definitions that include the number of bits represented. 
These include int8 (8 bit integers), intl6 (16 bit integers), int32 (32 bit integers), and int64 (64 
bit integers). In addition, unsigned values (uint8, uintl6, uint32, uint64) are also supported. 
Special Endian-Sensitive Scalar Types 

The ASL is commonly used for manipulating the contents of packets which are generally 
in network byte order. The ASL provides type definitions similar to the basic scalar types, but 
which represent data in network byte order. Types in network byte order as declared in the same 
fashion as the basic scalar types but with a leading n prefix (e.g. nuintl6 refers to an unsigned 16 
bit quantity in network byte order). The following functions are used to convert between the 
basic types (host order) and the network order types: 

uint32 ntohl (nuint32 n) ; // network to host (32 bit) 
uintl6 ntohs (nuintl6 n) ; // network to host (16 bit) 
nuint32 htonl(uint32 h) ; // host to network (32 bit) 
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nuintl6 htons (uintl6 h) ; // host to network (16 bit) 
Macros and Classes for Handling Errors and Exceptions in the ASL 

The ASL contains a number of C/C++ macro definitions used to aid in debugging and 
code development (and mark fatal error conditions). These are listed below: 
ASSERT Macros (asserts boolean expression, halts on failure) 
CHECK Macros (asserts boolean, returns from current real-time loop on failure) 
STUB Macros (gives message, C++ file name and line number) 
SHO Macros (used to monitor value of a variable/expression during execution) 
Exceptions 

The ASL contains a number of functions available for use as exception handlers. 
Exceptions are a programming construct used to delivery error information up the call stack. 
The following functions are provided for handling exceptions: 

NBaction_err and NBaction_warn functions to be invoked when exceptions are 

thrown. 

OnError class, used to invoke functions during exception handling, mostly for debugger 
breakpoints. 
ACE support 

Ace objects in the ASL contain the per-Ace state information. To facilitate common 
operations, the base Ace class' pass and drop targets are provided by the base class and built 
when an Ace instance is constructed. If no write action is taken on a buffer that arrives at the Ace 
(i.e. none of the actions of matching rules indicates it took ownership), the buffer is sent to the 
pass target. The pass and drop functions (i.e. target take functions, below) may be used directly 
as actions within the NCL application description, or they may be called by other actions. 
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Member functions of the Ace class include: pass(), drop(), enaRule()~ enable a rule, 
disRule() - disable a rule. 
Action support: 

The init_actions() call is the primary entry point into the application's Action code. 
It is used by the ASL startup code to initialize the PE portion of the Network Application. It is 
responsible for constructing an Ace object of the proper class, and typically does nothing else. 
Example syntax: 

INITF init_actions (void* id, char* name, Image* ob j ) 
{ 

return new ExampleAce (id, name, ob j ) ; 

} 

The function should return a pointer to an object subclassed from the Ace class, or a NULL 
pointer if an Ace could not be constructed. Throwing an NBaction_err or 
NBaction_warn exception may also be appropriate and will be caught by the initialization 
code. Error conditions will be reported back to the Resolver as a failure to create the Ace. 
Return Values from Action Code/Handlers 

When a rule's action portion is invoked because the rule predication portion evaluated true, 
the action function must return a code indicating how processing should proceed. The action 
may return a code indicating it has disposed of the frame (ending the classification phase), or it 
may indicate it did not dispose of the frame, and further classification (rule evaluations) should 
continue. A final option available is for the action to return a defer code, indicating that it wishes 
to modify a frame, but that the frame is in use elsewhere. The return values are defined as 
C/C++ pre-processor definitions: 
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• #def ine RULE_DONE . . . 

Actions should return RULEDONE to terminate processing of rules and actions within the 
context of the current Ace; for instance, when a buffer has been sent to a target, or stored for 
later processing. 

• #define RULE_CONT ... 

Actions should return RULE_CONT if they have merely observed the buffer and wish for 
additional rules and actions within the context of the current ace to be processed. 

• #def ine RULE_DEFER . . . 

Actions should return RULE_DEFER if they wish to modify a packet within a buffer but the 
buffer notes that the packet is currently busy elsewhere. 
Predefined Actions 

The common cases of disposing of a frame by either dropping it or sending it on to the 
next classification entity for processing is supported by two helper functions available to NCL 
code and result in calling the functions Ace : : pass() or Ace : : drop() within the ASL: 
act ionjpas s (predefined action), passes frame to 'pass target', always returns RULE_DONE 
action_drop (predefined action), passes frame to 'drop target', always returns RULE_DONE 
User-Defined Actions 

Most often, user-defined actions are used in an Ace. Such actions are implemented with 
the following calling structure. 

The ACTNF return type is used to set up linkage. Action handlers take two arguments: 
pointer to the current buffer being processed, and the Ace associated with this action. Example: 

ACTNF do_mcast (Buffer *buf, ExAce *ace) { 
ace->mcast_ct ++; 
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cout « ace->name() « " : " « ace->mcast_ct « endl; 
return ace->drop (buf ) ; 

} 

Thus, the Buffer* and ExAce* types are passed to the handler. In this case, ExAce is derived 
from the base Ace class: 

#include "NBaction/NBaction . h" 

class ExAce : public Ace { 

public: 

ExAce (Moduleld id, char *name, Image *obj ) 
: Ace (id, name, obj ) , mcast_ct(0) { } 
int mcast_ct; 

}; 

INITF init_actions (void *id, char *name, Image *obj ) { 
return new ExAce (id, name, ob j ) ; 

} 

Buffer Management (Buffer class) 

The basic unit of processing in the ASL is the Buffer. All data received from the network 
is received in buffers, and all data- to be transmitted must be properly formatted into buffers. 
Buffers are reference-counted. Contents are typed (more specifically, the type of the first header 
has a certain type [an integer/enumerated type]). Member functions of the Buffer class 
support common trimming operations (trim head, trim tail) plus additions (prepend and append 
date). Buffers are assigned a time stamp upon arrival and departure (if they are transmitted). 
The member function rxTime() returns receipt time stamp of the frame contained in the buffer. 
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The txTime() gives transmission complete time stamp of the buffer if the frame it contains has 
been transmitted. Several additional member functions and operators are supported: new() - 
allocates buffer from pool structure (see below), headerBase() - location of first network 
header, headerOffset()- reference to byte offset from start of storage to first network 
header, packet Si ze() - number of bytes in frame, headerType() - type of first header, 
packetPadHeadSizeO - free space before net packet, packetPadTailSize() - free 
space after net packet, prepend() - add data to beginning, append() - add data to end, 
t rim_head() - remove data from head, trim_tail() - remove data from end, 
{ rx , tx } TimeO - see above, next() - reference to next buffer on chain, incref () - bump 
reference count, decref()- decrement reference count, busy() - indicates buffer being 
processed, log() - allows for adding info the 'transaction log 1 of a buffer which can indicate 
what has processed it. 
Targets 

Target objects within an Ace indicate the next hardware or software resource that will 
classify a buffer along a selected path. Targets are bound to another Ace within the same 
application, an Ace within a different application, or a built in resource such as decryption. 
Bindings for Targets are set up by the plumber (see above). The class includes the member 
function take() which sends a buffer to the next downstream entity for classification. 

Targets have an associated module and Ace (specified by a "Moduleld" object and an 
Ace*). They also have a name in the name space contained in the resolver , which associates 
Aces to applications. 
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Upcall 

An upcall is a form of procedure call initiated in the PE module and handled in the AP 
module. Upcalls provide communication between the "inline" portion of an application and its 
"slower path" executing in the host environment. Within the ASL, the upcall facility sends 
messages to the AP. Messages are defined below. The upcall class contains the member 
function call() - which takes objects of type Message* and sends them asynchronously to 
AP module. 
DowncallHandler 

A downcall is a form of procedure call initiated in the AP module and handled in the PE 
module. Downcalls provide the opposite direction of communication than upcalls. The class 
contains the member function direct() which provides a pointer to the member function of the 
Ace class that is to be invoked when the associated downcall is requested in the AP. The Ace 
member function pointed to takes a Message * type as argument . 
Message 

Messages contain zero, one, or two blocks of message data, which are independently 
constructed using the MessageBlock constructors (below). Uninitialized blocks will appear at 
the Upcall handler in the AP module as zero length messages. Member functions of the 
Message class include: msgl(), msg2(), lenl(), len2() - returns addresses and lengths of the 
messages [if present]. Other member functions: clrl(), clr2(), done() - acknowledge receipt of a 
message and free resources. 
MessageBlock 

The MessageBlock class is used to encapsulate a region of storage within the Policy 
Engine memory that will be used in a future Upcall Message. It also includes a method to be 
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called when the service software has copied the data out of that storage and no longer needs it to 
be stable (and can allow it to be recycled). Constructor syntax is as follows: 

MessageBlock (char *msg, int len=0, DoneFp done=0); 

MessageBlock (Buf f er *buf ) ; 

MessageBlock (int len, int off=0) ; 
The first form specifies an existing data area to be used as the data source. If the completion 
callback function (DoneFp) is specified, it will be called when the data has been copied out of 
the source area. Otherwise, no callback is made and no special actions are taken after the data is 
copied out of the message block. If no length is specified, then the base pointer is assumed to 
point to a zero-terminated string; the length is calculated to include the null termination. The 
second form specifies a Buffer object; the data transferred is the data contained within the 
buffer, and the relative alignment of the data within the 32-bit word is retained. The reference 
count on the buffer is incremented when the MessageBlock is created, and the callback function 
is set to decrement the reference count when the copy out is complete. This will have the effect 
of marking the packet as "busy" for any actions that check for busy buffers, as well as preventing 
the buffer from being recycled before the copy out is complete. The third form requests that 
MessageBlock handle dynamic allocation of a region of memory large enough to hold a message 
of a specified size. Optionally, a second parameter can be specified that gives the offset from the 
32-bit word alignment boundary where the data should start. The data block will retain this 
relative byte offset throughout its transfer to the Application Processor. This allows, for instance, 
allocating a 1514-byte data area with 2-byte offset, building an Ethernet frame within it, and 
having any IP headers included in the packet land properly aligned on 32-bit alignment 
boundaries. 
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Sets 

Sets are an efficient way to track a large number of equivalence classes of packets, so that 
state can be kept for all packets that have the same values in specific fields. For instance, the 
programmer might wish to count the number of packets that flow between any two specific IP 
address pairs, or keep state for each TCP stream. Sets represent collections of individual 
members, each one of which matches buffers with a specific combination of field values. If the 
programmer instead wishes to form sets of the form "the set of all packets with IP header lengths 
greater than twenty bytes," then the present form of sets are not appropriate; instead, a 
Classification Predicate should be used. 

In NCL, the only information available regarding a set is whether or not a set contained a 
record corresponding to a vector of search keys. Within the ASL, all other set operations are 
supported: searches, insertions, and removals. For searches conducted in the CE, the ASL 
provides access to additional information obtained during the search operation: specifically, a 
pointer to the actual element located (for successful searches), and other helpful information such 
as an insertion pointer (on failure). The actual elements stored in each set are of a class 
constructed by the compiler, or are of a class that the software vendor has subclassed from that 
class. The hardware environment places strict requirements on the alignment modulus and 
alignment offset for each set element. 

As shown in the NCL specification, a single set may be searched by several vectors of 
keys, resulting in multiple search results that share the same target element records. Each of 
these directives results in the construction of a function that fills the key fields of the suitable 
Element subclass from a buffer. 
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Within the ASL, the class set is used to abstract a set. It serves as a base class for 
compiler generated classes specific to the sets specified in the NCL program (see below). 
Search 

The Search class is the data type returned by all set searching operations, whether 
provided directly by the ASL or executed within the classification engine. Member functions: 
ran() - true if the CE executed this search on a set, hit() - true if the CE found a match using 
this search, miss() - inverse of hit() but can return a cookie making inserts faster, 
toElementO - converts successful search result to underlying object, insert() - insert an 
object at the place the miss() function indicates we should. 
Element 

Contents of sets are called elements, and the NCL compiler generates a collection of 
specialized classes derived from the Element base class to contain user-specified data within 
set elements. Set elements may have an associated timeout value, indicating the maximum 
amount of time the set element should be maintained. After the time out is reached, the set 
element is automatically removed from the set. The time out facility is useful for monitoring 
network activity such as packet flows that should eventually be cleared due to inactivity. 
Compiler-Generated Elt_<setname> Classes 

For each set directive in the NCL program, the NCL compiler produces an adjusted 
subclass of the Element class called Elt_<setname>, substituting the name of the set for 
<setname>. This class is used to define the type of elements of the specified set. Because each 
set declaration contains the number of keys needed to search the set, this compiler-generated 
class is specialized from the element base class for the number of words of search key being 
used. 
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Compiler-Generated Set_<setname> Classes 

For each set directive in the NCL program, the NCL compiler produces an adjusted 
subclass of the Element class called Set_<setname>, substituting the name of the set for 
<setname>. This class is used to define the lookup functions of the specified set. The NCL 
compiler uses the number of words of key information to customize the parameter list for the 
lookup function; the NCL si ze_hint is used to adjust a protected field within the class. Aces 
that needing to manipulate sets should include an object of the customized Set class as a member 
of their Ace. 
Events 

The Event class provides for execution of functions at arbitrary times in the future, with 
efficient rescheduling of the event and the ability to cancel an event without destroying the event 
marker itself. A calendar queue is used to implement the event mechanism. When constructing 
objects of the Event class, two optional parameters may be specified: the function to be called 
(which must be a member function of a class based on Event), and an initial scheduled time (how 
long in the future, expressed as a Time object). When both parameters are specified, the event's 
service function is set and the event is scheduled. If the Time parameter is not specified, the 
Event's service function is still set but the event is not scheduled. If the service function is not 
set, it is assumed that the event will be directed to a service function before it is scheduled in the 
future. Member functions of this class include: direct() - specifies what function to be 
executed at expiry, schedule() - indicates how far in the future for event to trigger, cancel() 
- unschedule event, cur rQ - get time of currently running event. 
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Rate 

The Rate class provides a simple way to track event rates and bandwidths in order to 
watch for rates exceeding desired values. The Rate constructor allows the application to specify 
arbitrary sampling periods. The application can (optionally) specify how finely to divide the 
sampling period. Larger divisors result in more precise rate measurement but require more 
overhead, since the Rate object schedules Events for each of the shorter periods while there are 
events within the longer period. Member functions of this class include: clear() - reset 
internal state, add() - bumps event count, count() - gives best estimate of current trailing rate 
of events over last/longer period 
Time 

The Time class provides a common format for carrying around a time value. Absolute, 
relative, and elapsed times are all handled identically. As conversions to and from int64 (a sixty- 
four bit unsigned integer value) are provided, all scalar operators are available for use; in 
addition, the assignment operators are explicitly provided. Various other classes use Time 
objects to specify absolute times and time intervals. For maximum future flexibility in selection 
of storage formats, the actual units of the scalar time value are not specified; instead, they are 
stored as a class variable. Extraction of meaningful data should be done via the appropriate 
access methods rather than by direct arithmetic on the Time object. 

Class methods are available to construct Time objects for specified numbers of standard 
time units (microseconds, milliseconds, seconds, minutes, hours, days and weeks); also, methods 
are provided for extraction of those standard time periods from any Time object. Member 
functions include: cur rO - returns current real time, operators: +=, -=, *=, /=, %=, «=% »=, 
|= A =, &=, accessors + builders: usec(), msec(), secs(), mins(), hour(), days(), week(), which 
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access or build Time objects using the specified number of microseconds, milliseconds, seconds, 
minutes, hours, days, and weeks, respectively. 
Memory Pool 

The Pool class provides a mechanism for fast allocation of objects of fixed sizes at 
specified offsets from specified power-of-two alignments, restocking the raw memory resources 
from the PE module memory pool as required. The constructor creates an object that describes 
the contents of the memory pool and contains the configuration control information for how 
future allocations will be handled. 

Special 'offset 1 and 'restock' parameters are used. The offset parameter allows allocation 
of classes where a specific member needs to be strongly aligned; for example, objects from the 
Buffer class contain an element called hard that must start at the beginning of a 2048-byte- 
aligned region. The restock parameter controls how much memory is allocated from the 
surrounding environment when the pool is empty. Enough memory is allocated to contain at least 
the requested number of objects, of the specified size, at the specified offset from the alignment 
modulus. Member function include: take() - allocate a chunk, f ree() - return a chunk to the 
pool. 

Tagged Memory Pool 

Objects that carry with them a reference back to the pool from which they were taken are 
called tagged This is most useful for cases when the code that frees the object will not 
necessarily know what pool it came from. This class is similar to normal Memory Pools, except 
for internal details and the calling sequence for freeing objects back into the pool. The tagged 
class trades some additional space overhead for the flexibility of being able to free objects 
without knowing which Tagged pool they came from; this is similar to the overhead required by 
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most C library ma Hoc implementations. If the object has strong alignment requirements, the 
single added word of overhead could cause much space to be wasted between the objects. For 
instance, if the objects were 32 bytes long and were required to start on 32-byte boundaries, the 
additional word would cause another 28 bytes of padding to be wasted between adjacent objects. 

The Tagged class adds a second (static) version of the take method, which is passed the 
size of the object to be allocated. The Tagged class manages an appropriate set of pools based on 
possible object sizes, grouping objects of similar size together to limit the number of pools and 
allow sharing of real memory between objects of slightly different sizes. Member functions 
include: take() - allocate a chunk, f ree() - return a chunk to the pool. - 
Dynamic 

This class takes care of overloading the new and delete operators, redirecting the 
memory allocation to use a number of Tagged Pools managed by the NB ACTION DLL. All 
classes derived from Dynamic share the same set of Tagged Pools; each pool handles a specific 
range of object sizes, and objects of similar sizes will share the same Tagged Pool. The dynamic 
class has no storage requirements and no virtual functions. Thus, declaring objects derived from 
Dynamic will not change the size or layout of your objects (just how they are allocated). 
Operators defined include: new() - allocate object from underlying pool, delete() - return to 
underlying pool. 
Name Dictionary 

The Name class keeps a database of named objects (that are arbitrary pointers in the 
memory address space of the ASL. It provides mechanisms for adding objects to the dictionary, 
finding objects by name, and removing them from the dictionary. It is implemented with a 
Patricia Tree (a structure often used in longest prefix match in routing table lookups). Member 
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functions include: f ind() - look up string, name() - return name of dictionary. 

2. ASL Extensions for TCP/IP 

The TCP/IP Extensions to the Action Services Library (ASL) provides a set of class 
definitions designed to make several tasks common to TCP/IP-based network-oriented 
applications easier. With functions spanning several protocol layers, it includes operations such 
as IP fragment reassembly and TCP stream reconstruction. Note that many of the functions that 
handle Internet data make use of 1 6 and 3 2 -bit data types beginning with 'n* (such asnuintl6 
and nuint 32). These data types refer to data in network byte order (i.e. big endian). Functions 
used to convert between host and network byte such as htonl() (which converts a 32-bit word 
from host to network byte order), are also defined. 

3. The Internet Class 

Functions of potential use to any Internet application are grouped together as methods of 
the Internet class. These functions are declared static within the class, so that they may be 
used easily without requiring an instantiation of the Internet class. 
Internet Checksum Support 

The Internet Checksum is used extensively within the TCP/IP protocols to provide 
reasonably high assurance that data has been delivered correctly. In particular, it is used in IP 
(for headers), TCP and UDP (for headers and data), ICMP (for headers and data), and IGMP (for 
headers). 

The Internet checksum is defined to be the l's complement of the sum of a region of data, 
where the sum is computed using 16-bit words and V s complement addition. 

Computation of this checksum is documented in a number of RFCs (available from 
ftp : //ds . internic . net /rf c): RFC 1936 describes a hardware implementation, RFC 
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1624 and RFC 1141 describe incremental updates, RFC 1071 describes a number of 
mathematical properties of the checksum and how to compute it quickly. RFC 1071 also includes 
a copy of IEN 45 (from 1978), which describes motivations for the design of the checksum. 

The ASL provides the following functions to calculate Internet Checksums: 
cksum 

Description 

Computes the Internet Checksum of the data specified. This function works properly for 
data aligned to any byte boundary, but may perform (significantly) better for 32-bit aligned data. 
Syntax 

static nuintl6 Internet :: cksum (u_char* base, int len); 
Parameters 



Parameter 


Type 


Description 


base 


unsigned 
char * 


The starting address of the data. 


len 


int 


The number of bytes of data. j 



Return value 

Returns the Internet Checksum in the same byte order as the underlying data, which is 
assumed to be in network byte order (big endian). 
psum 

Description 

Computes the 2 , s-complement sum of a region of data taken as 16-bit words. The 
Internet Checksum for the specified data region may be generated by folding any carry bits 
above the low-order 16 bits and taking the l's complement of the resulting value. 
Syntax 
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static uint32 Internet :: psum (u__char* base, int len); 
Parameters 



Parameter 


Type 


Description 


base 


unsigned 
char * 


The starting address of the data. 


len 


int 


The number of bytes of data. 



Return value 

Returns the 2's-complement 32-bit sum of the data treated as an array of 16-bit words, 
incrcksum 

Description 

Computes a new Internet Checksum incrementally. That is, a new checksum is computed 
given the original checksum for a region of data, a checksum for a block of data to be replaced, 
and a checksum of the new data replacing the old data. This function is especially useful when 
small regions of packets are modified and checksums must be updated appropriately (e.g. for 
decrementing IP ttl fields or rewriting address fields for NAT). 
Syntax 

static uintl6 



Internet :: incrcksum (nuintl6 ocksum, nuintl6 odsum, nuintl6 
ndsum) ; 
Parameters 



Parameter 


Type 


Description 


ocksum 


nuintl6 


The original checksum. 


odsum 


nuintl6 


The checksum of the old data. • 


ndsum 


nuintl6 


The checksum of the new (replacing) data. 



Return value 
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Returns the computed checksum. 

a sum 
Description 

The function a sum computes the checksum over only the IP source and destination 
addresses. 
Syntax 

static uintl6 asum ( IP4Header* hdr); 
Parameters 



Parameter 


Type 


Description 


hdr 


IP4Header * 


Pointer to the header. 



Return value 

Returns the checksum, 
apsum 

Description 

The function apsum behaves like asum but includes the address plus the two 16-bit 
words immediately following the IP header (which are the port numbers for TCP and UDP). 
Syntax 

static uintl6 apsum ( IP4Header* hdr); 
Parameters 



Parameter 


Type 


Description 


hdr 


IP4Header * 


Pointer to the header. 



Return value 

Returns the checksum. 
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apssum 
Description 

The function apssum behaves like apsum, but covers the IP addresses, ports, plus TCP 
sequence number. 
Syntax 

static uintl6 apssum (IP4Header* hdr); 
Parameters 



Parameter 


Type 


Description 


hdr 


IP4Header * 


Pointer to the header. 



Return value 

Returns the checksum, 
apasum 

Description 

The function apasum is behaves like apssum, but covers the TCP ACK field instead of 
the sequence number field. 
Syntax 

static uintl6 apasum ( IP4Header* hdr); 
Parameters 



Parameter 


Type 


Description | 


hdr 


IP4Header * 


Pointer to the header. 



Return value 

Returns the checksum. 
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apsasum 
Description 

The function apsasum behaves like apasum but covers the IP addresses, ports, plus the 
TCP ACK and sequence numbers. 
Syntax 

static uintl6 apsasum ( IP4Header* hdr); 
Parameters 



Parameter 


Type 


Description 


hdr 


IP4Header * 


Pointer to the header. 



Return value 

Returns the checksum. 

4. IP Support 

This section describes the class definitions and constants used in processing IP-layer data. 
Generally, all data is stored in network byte order (big endian). Thus, care should be taken by the 
caller to ensure computations result in proper values when processing network byte ordered data 
on little endian machines (e.g. in the NetBoost software-only environment on pc-compatible 
architectures). 

5. IP Addresses 

The IP4Addr class defines 32-bit IP version 4 addresses. 
Constructors 
Description 

The class I P4 Addr is the abstraction of an IP (version 4) address within the ASL. It has 
two constructors, allowing for the creation of the IPv4 addresses given an unsigned 32-bit word 
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in either host or network byte order. In addition, the class is derived from nuint 32 , so IP 

addresses may generally be treated as 32-bit integers in network byte order. 

Syntax 

IP4Addr (nuint32 an); 
IP4Addr (uint32 ah) ; 
Parameters 



Parameter 


Type 


Description 


an 


nuint32 


Unsigned 32-bit word in network byte order. 


ah 


uint32 


Unsigned 32-bit word in host byte order. 



Return value 

None. 

Example 

The following simple example illustrates the creation of addresses: 
#include "NBip.h" 

uint32 myhaddr = (128 « 24) | (32 « 16) | (12 « 8) |4; 
nuint32 mynaddr = htonl((128 « 24)|(32 « 16)1(12 « 8) I 4); 
IP4Addr ipl (myhaddr) ; 
IP4Addr ip2 (mynaddr) ; 

This example creates two IP4Addr objects, each of which refer to the IP address 128.32.12.4. 
Note the use of the htonl() ASL function to convert the host 32-bit word into network byte 
order. 

6. IP Masks 
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Masks are often applied to IP addresses in order to determine network or subnet numbers, 
CIDR blocks, etc. The class IP4Mask is the ASL abstraction for a 32-bit mask, available to be 
applied to an IPv4 address (or for any other use). 
Constructor 

Description 

Instantiates the I P4 Mask object with the mask specified. 

Syntax 

IP4Mask (nuint32 ran) ; 
IP4Mask (uint32 mh) ; 
Parameters 



Parameter 


Type 


Description 


mh 


uint32 


32-bit mask in host byte order 


mn 


nuint32 


32-bit mask in network byte order 



Return value 

None . 

lef tcontig 
Description 

Returns true if all of the 1 -bits in the mask are left-contiguous, and returns false 
otherwise. 
Syntax 

bool lef tcontig ( ) ; 
Parameters 
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None. 
Return value 

Returns true if all the 1 -bits in the mask are left-contiguous. 

bits 
Description 

The function bits returns the number of left-contiguous 1 -bits in the mask (a form of 
"population count"). 
Syntax 

int bits ( ) ; 
Parameters 
None. 
Return value 

Returns the number of left-contiguous bits in the mask. Returns -1 if the 1 -bits in the 
mask are not left-contiguous. 
Example 

#inlude NBip.h 

uint32 mymask = 0xffffff80; // 255.255.255.128 or /25 
IP4Mask ipm (mymask) ; 
int nbits = ipm. bits (); 
if (nbits >= 0) { 

sprintf (msgbuf , "Mask is of the form /%d", nbits); 
} else { 

sprintf (msgbuf , "Mask is not left-contiguous!"); 
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} 

This example creates a subnet mask with 25 bits, and sets up a message buffer containing a string 
which describes the form of the mask (using the common "slash notation" for subnet masks). 
7. IP Header 

The IP4Header class defines the standard IP header, where sub-byte sized fields have 
been merged in order to reduce byte-order dependencies. In addition to the standard IP header, 
the class includes a number of methods for convenience. The class contains no virtual functions, 
and therefore pointers to the lP4Header class may be used to point to IP headers received in 
live network packets. 

The class contains a number of member functions, some of which provide direct access to 
the header fields and others which provide computed values based on header fields. Members 
which return computed values are described individually; those functions which provide only 



simple access to fields are as follows: 



Function 


Return Type 


Description 


vhlO 


nuint8& 


Returns a reference to the byte containing the IP version and header 
length 


tosO . 


nuint8& 


Returns a reference to the IP type of service byte 


lenO 


nuintl6& 


Returns a reference to the IP datagram (fragment) length in bytes ] 


id() 


nuintl6& 


Returns a reference to the IP identification field (used for j 
fragmentation) 


offsetO 


nuintl6& 


Returns a reference to the word containing fragmentation flags and 
fragment offset 


ttl() 


nuint8& 


Returns a reference to the IP time-to-live byte 


proto() 


nuint8& 


Returns a reference to the IP protocol byte 


cksum() 


nuintl6& 


Returns a reference to the IP checksum 


src() 


IP4Addr& 


Returns a reference to the IP source address 


dst() 


IP4Addr& 


Returns a reference to the IP destination address 
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The following member functions of the I P4 Header class provide convenient methods for 

accessing various information about an IP header. 

optbase 

Description 

Returns the location of the first IP option in the IP header (if present). 

Syntax 

unsigned char* optbase (); 

Parameters 

None. 

Return value 

Returns the address of the first option present in the header. If no options are present, it 
returns the address of the first byte of the payload. 
hi 

Description 

The first form of this function returns the number of 32-bit words in the IP header. 
The second form modifies the header length field to be equal to the specified length. 
Syntax 
int hi () ; 
void hi (int h) ; 
Parameters 



Parameter 


Type 


Description 


h 


int 


Specifies the header length (in 32-bit words) to assign to the IP header 



Return value 
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The first form of this function returns the number of 32-bit words in the IP header. 

hlen 
Description 

The function hlen returns the number of bytes in the IP header (including options). 

Syntax 

int hlen ( ) ; 
Parameters 
None. 
Return value 

Returns the number of bytes in the IP header including options. 

ver 

Description 

The first form of this function ver returns the version field of the IP header (should be 4). 
The second form assigns the version number to the IP header. 

Syntax 

int ver ( ) ; 

void ver (int v) ; 

Parameters 



Parameter 


Type 


Description 


V 


int 


Specifies the version number. 



Return value 

The first form returns the version field of the IP header, 
payload 
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Description 

The function payload returns the address of the first byte of data (beyond any options 
present). 
Syntax 

unsigned char* payload(); 
Parameters 
None. 
Return value 

Returns the address of the first byte of payload data in the IP packet. 

psum 
Description 

The function psum is used internally by the ASL library, but may be useful to some 
applications. It returns the 16-bit one's complement sum of the source and destination IP 
addresses plus 8-bit protocol field [in the low-order byte]. It is useful in computing pseudo- 
header checksums for UDP and TCP. 
Syntax 

uint32 psum( ) ; 
Parameters 
None. 
Return value 

Returns the 16-bit one's complement sum of the source and destination IP addresses plus 
the 8-bit protocol field. 
Definitions 
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In addition to the IP header itself, a number of definitions are provided for manipulating 
fields of the IP header with specific semantic meanings. 
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Fragmentation 



Define 


Value 


Description 


IP DF 


0x4000 


Don't fragment flag, RFC 791, p. 13. 


IP MF 


0x2000 


More fragments flag, RFC 791, p. 13. 


IPOFFMASK 


OxlFFF 


Mask for determining the fragment offset from the IP header 
offsetQ function. 



Limitations 

|IP_MAXPACKET | 65535 jMaximum IP datagram size. 
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IP Service Type 



The following table contains the definitions for IP type of service byte (not commonly 

used): 



Define 


Value 


Reference 


IPTOS_LOWDELAY 


0x10 


RFC 791, p. 12. 


IPTOS_THROUGHPUT 


0x08 


RFC 791, p. 12. 


IPTOS RELIABILITY 


0x04 


RFC 791, p. 12. 


IPTOS_MINCOST 


0x02 


RFC 1349. 



IP Precedence 

The following table contains the definitions for IP precedence. All are from RFC 791, p. 
12 (not widely used). 



Define 


Value 


IPTOS_PREC_NETCONTROL 


OxEO 


IPTOSPRECINTERNETCONTROL 


OxCO 


IPTOS_PREC_CRITIC_ECP 


OxAO 


IPTOS_PREC_FLASHOVERRIDE 


0x80 


IPTOS_PREC_FLASH 


0x60 


IPTOS_PREC_IMMEDIATE 


0x40 


IPTOS_PREC_PRIORITY 


0x20 


IPTOS_PREC_ROUTINE 


0x00 
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Option Definitions 

The following table contains the definitions for supporting IP options. All definitions are 



from RFC 791, pp. 15-23. 



Define 


Value 


Description 


IPOPT_COPIED(o) 


((o)&0x80) 


A macro which returns true if the option V is to be 
copied upon fragmentation. 


IPOPT_CLASS(o) 


((o)&0x60) 


A macro giving the option class for the option 'o' 


IPOPT_NUMBER(o) 


((o)&0xlF) 


A macro giving the option number for the option 'o' 


IPOPT CONTROL 


0x00 


Control class 


IPOPT RESERVED 1 


0x20 


Reserved 


IPOPTDEBMEAS 


0x40 


Debugging and/or measurement class ! 


IPOPTJRESERVED2 


0x60 


Reserved 


IPOPT EOL 


0 


End of option list. 


IPOPT_NOP 


1 


No operation. 


IPOPT RR 


7 


Record packet route. 


IPOPT TS 


68 


Time stamp. 


IPOPT_SECURITY 


130 


Provide s, c, h, tec. 


IPOPT_LSRR 


131 


Loose source route. 


IPOPT_SATID 


136 


Satnet ID. 


IPOPT_SSRR 


137 


Strict source route. 


IPOPT RA 


148 


Router alert. 



Options Field Offsets 



The following table contains the offsets to fields in options other than EOL and NOP. 



Define 


Value 


Description 


IPOPT OPTVAL 


0 


Option ID. 


IPOPT_OLEN 


1 


Option length. 


IPOPT_OFFSET 


2 


Offset within option. 


IPOPT MINOFF 


4 


Minimum value of offset. 



SFRUB1\JKS\5091501.05 



Page 163 



7. Fragments and Datagrams 

The IP protocol performs adaptation of its datagram size by an operation known as 
fragmentation. Fragmentation allows for an initial (large) IP datagram to be broken into a 
sequence of IP fragments, each of which is treated as an independent packet until they are 
received and reassembled at the original datagram's ultimate destination. Conventional IP routers 
never reassemble fragments but instead route them independently, leaving the destination host to 
reassemble them. In some circumstances, however, applications running on the NetBoost 
platform may wish to reassemble fragments themselves (e.g. to simulate the operation of the 
destination host). 

8. IP Fragment class 

Within the ASL, a fragment represents a single IP packet (containing an IP header), 
which may or not be a complete IP layer datagram. In addition, a datagram within the ASL 
represents a collection of fragments. A datagram (or fragment) is said to be complete if it 
represents or contains all the fragments necessary to represent an entire IP-layer datagram. 

The I P4 Fragment class is defined as follows. 
Constructors 
Description 

The I P 4 Fragment class provides the abstraction of a single IP packet placed in an ASL 
buffer (see the description of the Buffer elsewhere in this chapter). It has two constructors 
intended for use by applications. 

• The first of these allows for specifying the buffer containing an IP fragment as the 
parameter bp. The location of the of the IP header within the buffer is the second 
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argument. This is the most commonly-used constructor when processing IP fragments in 
ACE action code. 

• The second form of the constructor performs the same steps as the first form, but also 
allocates a new Buff er object and copies the IP header pointed to by iph into the new 
buffer (if specified). This form of the constructor is primarily intended for creation of IP 
fragments during IP datagram fragmentation. If the specified header contains IP options, 
only those options which are copied during fragmentation are copied. 

Syntax 

IP4Fragment (Buffer* bp, IP4Header* iph); 

I P4 Fragment (int maxiplen, IP4Header* protohdr = 0) ; 

Parameters 



Parameter 


Type 


Description 


bp 


Buffer * 


The starting address of the buffer containing the IP fragment 


maxiplen 


int 


The maximum size of the fragment being created; used to size the 
allocated Buffer. 


protohdr 


IP4Header * 


The IP4 header to copy into the buffer, if provided. If the header 
contains IP options, only those options normally copied during 
fragmentation are copied. 



Return value 

None. 
Destructor 

Description 

Frees the fragment. 

Syntax 

- 1 P4 Fragment ( ) ; 
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Parameters 

None. 

Return Value 

None, 
hdr 

Description 

The function hdr returns the address of the IP header of the fragment. 

Syntax 

IP4Header* hdr ( ) ; 
Parameters 
None. 
Return Value 

Returns the address of the I P4 Header class at the beginning of the fragment, 
payload 

Description 

The function payload returns the address of the first byte of data in the IP fragment 
(after the basic header and options). 
Syntax 

u_char* payload (); 
Parameters 
None. 
Return Value 

Returns the address of the first byte of data in the IP fragment. 
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buf 

Description 

The function buf returns the address of the Buffer structure containing the IP fragment. 
Syntax 

Buffer* buf () ; 
Parameters 
None. 
Return Value 

Returns the address of the Buf f er structure containing the IP fragment. This may 
return NULL if there is no buffer associated with the fragment, 
next 
Description 

Returns a reference to the pointer pointing to the next fragment of a doubly-linked list of 
fragments. This is used to link together fragments when they are reassembled (in Datagrams), or 
queued, etc. Typically, fragments are linked together in a doubly-linked list fashion with NULL 
pointers indicating the list endpoints. 
Syntax 

IP4 Fragment *& next ( ) ; 
Parameters 
None. 
Return Value 

Returns a reference to the internal linked-list pointer. 

prev 
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Description 

Like next, but returns a reference to pointer to the previous fragment on the list. 

Syntax 

I P4 Fragment *& prev ( ) ; 
Parameters 

None. 
Return Value 

Returns a reference to the internal linked-list pointer, 
first 
Description 

The function first returns true when the fragment represents the first fragment of a 
datagram. 
Syntax 

bool first ( ) ; 
Parameters 
None. 
Return Value 

Returns true when the fragment represents the first fragment of a datagram, 
fragment 
Description 

Fragments an IP datagram comprising a single fragment. The f ragment() function 
allocates Buffer structures to hold the newly-formed IP fragments and links them together. It 
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returns the head of the doubly-linked list of fragments. Each fragment in the list will be limited 

in size to at most the specified MTU size. The original fragment is unaffected. 

Syntax 

IP4Datagram* fragment (int mtu) ; 
Parameters 



Parameter 


Type 


Description 


mtu 


int 


The maximum transmission unit MTU size limiting the maximum 
fragment size 



Return Value 



Returns a pointer to an I P4 Da t agram object containing a doubly-linked list of 
I P4 Fragment objects. Each fragment object is contained within a Buffer class allocated 
by the ASL library. The original fragment object (the one fragmented) is not freed by this 
function. The caller must free the original fragment when it is no longer needed, 
complete 
Description 

The function complete returns true when the fragment represents a complete IP 
datagram. 
Syntax 

bool complete ( ) ; 
Parameters 
None. 
Return Value 

Returns true when the fragment represents a complete IP datagram (that is, when the 
fragment offset field is zero and there are no additional fragments). 
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optcopy 
Description 

The static method optcopy is used to copy options from one header to another during IP 
fragmentation. The function will only copy those options that are supposed to be copied during 
fragmentation (i.e. for those options x where the macro IPOPT_COPIED(x) is non zero (true)). 
Syntax 

static int optcopy ( IP4Header* src, IP4Header* dst); 
Parameters 



Parameter 


Type 


Description 


src 


IP4Header * 


Pointer to the source IP header containing options 


dst 


IP4Header * 


Pointer to the destination, where the source header should be copied to 



Return value 

Returns the number of bytes of options present in the destination IP header. 
9. IP Datagram class 

The class IP4 Datagram represents a collection of IP fragments, which may (or may 
not) represent a complete IP4 datagram. Note that objects of the class I P4 Datagram include a 
doubly-linked list of IP4 Fragment objects in sorted order (sorted by IP offset). When IP 
fragments are inserted into a datagram (in order to perform reassembly), coalescing of data 
between fragments is not performed automatically. Thus, although the I P4 Datagram object 
may easily determine whether it contains a complete set of fragments, it does not automatically 
reconstruct a contiguous buffer of the original datagram's contents for the caller. 

This class supports the fragmentation, reassembly, and grouping of IP fragments. The 
I P4 Datagram class is defined as follows: 
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Constructors 

Description 

The class has two constructors. 

• The first form of the constructor is used when creating a fresh datagram (typically for 
starting the process of reassembly). 

• The second form is useful when an existing list of fragments are to be placed into the 
datagram immediately at its creation. 

Syntax 

IP4Datagram( } ; 

I P4 Datagram ( I P4 Fragment* frag); 
Parameters 



Parameter 


Type 


Description 


frag 


IP4Fragment * 


Pointer to a doubly linked list of fragments used to create the 
datagram object 



Return value 
None. 
Destructor 

Description 

The destructor calls the destructors for each of the fragments comprising the datagram 
and frees the datagram object, 
len 

Description 
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The len function returns the entire length (in bytes) of the datagram, including all of its 
comprising fragments. Its value is only meaningful if the datagram is complete. 
Syntax 
int len ( ) ; 
Parameters 

None. 
Return value 

Returns the length of the entire datagram (in bytes). If the datagram contains multiple 
fragments, only the size of the first fragment header is included in this value, 
fragment 
Description 

The fragment function breaks an IP datagram into a series of IP fragments, each of 
which will fit in the packet size specified by mtu. Its behavior is equivalent to the 
IP4Fragment : : fragment (int mtu) function described previously . 
Syntax 

IP4Datagram* fragment (int mtu); 
Parameters 

See I P4 Fragment :: fragment (int mtu) above. 
Return value 

See I P4 Fragment : : fragment ( int mtu) above, 
insert 
Description 
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The function insert inserts a fragment into the datagram. The function attempts to 
reassemble the overall datagram by checking the IP offset and ID fields. 
Syntax 

int insert ( IP4Fragment* frag); 
Parameters 



Parameter 


Type 


Description 


frag 


IP4Fragment * 


Pointer to the fragment being inserted. 



Return value 

Because this function can fail/act in a large number of ways, the following definitions are 
provided to indicate the results of insertions that were attempted by the caller. The return value 
is a 32-bit word where each bit indicates a different error or unusual condition. The first 
definition below, IPD_INSERT_ERR0R is set whenever any of the other conditions are 
encountered. This is an extensible list which may evolve to indicate new error conditions in 
future releases: 



Define 


Description 


IPD INSERT ERROR . 


f Or' of all other error bits. 


IPD INSERT OH 


Head overlapped. 


IPD INSERT OT 


Tail overlapped. 


IPD INSERT MISMATCH 


Payload mismatch. 


IPD_INSERT_CKFAIL 


IP header checksum failed (if enabled) 



nfrags 
Description 

The function nfrags returns the number of fragments currently present in the datagram. 

Syntax 
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int nfrags ( ) ; 
complete 

Description 

The function complete returns true when all fragments comprising the original 
datagram are present. 
Syntax 

bool complete ( ) ; 
Parameters 
None. 
Return value 

Returns a boolean value indicating when all fragments comprising the original datagram 
are present, 
head 
Description 

The function head returns the address of the first IP fragment in the datagram's linked 
list of fragments. 
Syntax 

IP4Fragment* head ( ) ; 
Parameters 
None. 
Return value 

Returns the address of the first IP fragment in the datagram's linked list of fragments. 
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10. UDP Support 

The UDP protocol provides a best-effort datagram service. Due to its limited complexity, 
only the simple UDP header definitions are included here. Additional functions operating on 
several protocols (e.g. UDP and TCP NAT) are defined in subsequent sections. 

11. UDP Header 

The UDPHeader class defines the standard UDP header. It is defined in NBudp . h. In 
addition to the standard UDP header, the class includes a single method for convenience in 
accessing the payload portion of the UDP datagram. The class contains no virtual functions, and 
therefore pointers to the UDPHeader class may be used to point to UDP headers received in live 
network packets. 

The class contains a number of member functions, most of which provide direct access to ■ 
the header fields. A special payload function may be used to obtain a pointer immediately 
beyond the UDP header. The following table lists the functions providing direct access to the 
header fields: 



Function 


Return Type 


Description 


sport() 


nuintl6& 


Returns a reference to the source UDP port number 


dport() 


nuintl6& 


Returns a reference to the destination UDP port number 


lenO 


nuintl6& 


Returns a reference to the UDP length field 


cksum() 


nuint!6& 


Returns a reference to the. UDP pseudoheader checksum. UDP 
checksums are optional; a value of all zero bits indicate no checksum is 
was computed. 



The following function provides convenient access to the payload portion of the datagram, and 
maintains consistency with other protocol headers (i.e. IP and TCP). 
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payload 
Description 

The function payload returns the address of the first byte of data (beyond the UDP 
header). 
Syntax 

unsigned char* payload (); 

Parameters 

None. 

Return value 

Returns the address of the first byte of payload data in the UDP packet. 

12. TCP Support 

The TCP protocol provides a stateful connection-oriented stream service. The ASL 
provides the TCP-specific definitions, including the TCP header, plus a facility to monitor the 
content and progress of an active TCP flow as a third party (i.e. without having to be an 
endpoint). For address and port number translation of TCP, see the section on NAT in 
subsequent sections of this document. 

13. TCP Sequence Numbers 

TCP uses sequence numbers to keep track of an active data transfer. Each unit of data 
transfer is called a segment, and each segment contains a range of sequence numbers. In TCP, 
sequence numbers are in byte units. If a TCP connection is open and data transfer is progressing 
from computer A to B, TCP segments will be flowing from A to B and acknowledgements will 
be flowing from B toward A. The acknowledgements indicate to the sender the amount of data 
the receiver has received. TCP is a bi-directional protocol, so that data may be flowing 
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simultaneously from A to B and from B to A. In such cases, each segment (in both directions) 
contains data for one direction of the connection and acknowledgements for the other direction 
of the connection. Both sequence numbers (sending direction) and acknowledgement numbers 
(reverse direction) use TCP sequence numbers as the data type in the TCP header. TCP 
sequence numbers are 32-bit unsigned numbers that are allowed to wrap beyond 2 A 32-1 . Within 
the ASL, a special class called TCPSeq defines this class and associated operators, so that 
objects of this type may be treated like ordinary scalar types (e.g. unsigned integers). 
14. TCP Header 

TheTCPHeader class defines the standard TCP header. In addition to the standard 
TCP header, the class includes a set of methods for convenience in accessing the payload portion 
of the TCP stream. The class contains no virtual functions, and therefore pointers to the 
TCPHeader class may be used to point to TCP headers received in live network packets. 

The class contains a number of member functions, most of which provide direct access to 
the header fields. A special payload function may be used to obtain a pointer immediately 
beyond the TCP header. The following table lists the functions providing direct access to the 
header fields: 



Function 


Return Type 


Description 


sport() 


nuintl6& 


Returns a reference to the source TCP port number 


dport() 


nuintl6& 


Returns a reference to the destination TCP port number 


seqO 


TCPSeq& 


Returns a reference to the TCP sequence number 


ack() 


TCPSeq& 


Returns a reference to the TCP acknowledgement number 


offO 


nuint8 


Returns the number of 32-bit words in the TCP header (includes TCP 
options) 


flagsO 


nuint8& 


Returns a reference to the byte containing the 6 flags bits (and 2 
reserved bits) 


win() 


nuintl6& 


Returns a reference to the window advertisement field (unsealed) j 


cksum() 


nuintl6& 


Returns a reference to the TCP pseudoheader checksum. TCP 
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checksums are not optional. 


udpO 


nuintl6& 


Returns a reference to the TCP urgent pointer field 



The following functions provides convenient access to other characteristics of the segment: 
pay load 

Description 

The function pay load returns the address of the first byte of data (beyond the TCP 
header). 
Syntax 

unsigned char* payload(); 

Parameters 

None. 

Return value 

Returns the address of the first byte of payload data in the TCP packet, 
window 

Description 

The function window returns the window advertisement contained in the segment, 
taking into account the use of TCP large windows (see RFC 1323). 
Syntax 

uint32 window (int wshift) 
Parameters 



Parameter 


Type 


Description 


wshift 


int 


The "window shift value" (number of left-shift bit positions to scale 
window field) 1 
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Return value 

Returns the receiver's advertised window in the segment (in bytes). This function is to be 
used when RFC 1323 -style window scaling is in use. 
optbase 

Description 

The function optbase returns the address of the first option in the TCP header, if any 
are present. If no options are present, it returns the address of the first payload byte (which may 
be urgent data if the URG bit is set in the flags field). 
Syntax 

u_char* optbase () 

Parameters 

None. 

Return value 

Returns the address of the first byte of data beyond the urgent pointer field of the TCP 

header, 
hlen 

Description 

The first form of this function ver returns the TCP header length in bytes. The second 
form assigns the TCP header length to the number of bytes specified. 
Syntax 

int hlen ( ) ; 

void hlen (int bytes); 

Parameters 
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Parameter 


Type 


Description 


bytes 


int 


Specifies the number of bytes present in the TCP header 



Return value 

The first form returns the number of bytes in the TCP header. 
Definitions 

In addition to the TCP header itself, a number of definitions are provided for 
manipulating options in TCP headers: 



TCP Options 



Define 


Value 


Description 


TCPOPT EOL 


0 


End of Option List 


TCPOPT NOP 


1 


No operation (used for padding 


TCPOPT MAXSEG 


2 


Maximum segment size ) 


TCPOPT SACK PERMITTED 


4 


Selective Acknowledgements available 


TCPOPT SACK 


5 


Selective Acknowledgements in this segment 


TCPOPTJTIMESTAMP 


8 


Time stamps 


TCPOPT CC 


11 


forT/TCP(seeRFC 1644) 


TCPOPT CCNEW 


12 


for T/TCP j 


TCPOPT CCECHO 


13 


for T/TCP 



15. TCP Following 

TCP operates as an 1 1 -state finite state machine. Most of the states are related to 
connection establishment and tear-down. By following certain control bits in the TCP headers of 
segments passed along a connection, it is possible to infer the TCP state at each endpoint, and to 
monitor the data exchanged between the two endpoints. 
Defines 

The following definitions are for TCP state monitoring, and indicate states in the TCP 
finite state machine: 
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Define 


Value 


Description 


TCPS CLOSED 


0 


Closed 


TCPS LISTEN 


1 


Listening for connection. 


lLro oiJN orJ/iN 1 


z 


/\L11VC UpCIl, Ild-VC bCIlL ij 


TCPS SYN RECEIVED 


3 


Have sent and received SYN. 


TCPS_ESTABLISHED 


4 


Established. 


TCPS CLOSE WAIT 


5 


Received FIN, waiting for closed. 


TCPS FIN WAIT 1 


6 , 


Have closed, sent FIN. 


TCPS_CLOSING 


7 


Closed exchanged FIN; awaiting 
FIN ACK. 


TCPS_LAST_ACK 


8 


Had FIN and close; await FIN 
ACK. 


TCPS FIN WAIT 2 


9 


Have closed, FIN is acked. 


TCPS TIME WAIT 


10 


In 2*MSL quiet wait after close. 


TCPSJHAVERCVDSYN(s) 


((s) >= 

TCPS_SYN_RECEIVED) 


True if state s indicates a SYN has 
been received 


TCPS HAVEESTABLISHED(s 
) 


((s) >= 

TCPSESTABLISHED) 


True if state s indicates have 
established ever 


TCPS_HAVERCVDFIN(s) 


((s) >= 

TCPS_TIME_WAIT) 


True if state s indicates a FIN ever 
received j 



Note 1 : States less than TCPS ESTABLISHED indicate connections not yet established. 
Note 2: States greater than TCPS_CLOSE_WAIT are those where the user has closed. 
Note 3: States greater than TCPS_CLOSE_WAIT and less than TCPS_FIN_WAIT_2 await 
ACK of FIN. 

The TCPSeglnf o Class 

The TCPSeglnfo class is a container class for TCP segments that have been queued 
during TCP stream reconstruction and may be read by applications (using the 
ReassemblyQueue : : read function, defined below). When segments are queued, they are 
maintained in a doubly-linked list sorted by sequence number order. Note that the list may 
contain "holes". That is, it may contain segments that are not adjacent in the space of sequence 
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numbers because some data is missing in between. In addition, because retransmitted TCP 
segments can potentially overlap one another's data areas, the starting and ending sequence 
number fields (startseq_ and endseq_) may not correspond to the starting sequence 
number 



The class contains the following fields, all of which are declared public: 



Field 


Type 


Description 


prev_ 


TCPSeglnfo* 


Pointer to the next TCPSeglnfo object of the forward linked 
list; NULL if no more 


next_ 


TCPSeglnfo* 


Pointer to the previous TCPSeglnfo object of the reverse 
linked list; NULL if no previous segment exists 


segment_ 


IP4Datagram* 


Pointer to the datagram containing the TCP segment 


startseq_ 


TCPSeq 


The starting sequence number for the segment 


endseq_ 


TCPSeq 


The ending sequence number for the segment 


startbuf_ 


uchar* 


Pointer to the byte whose sequence number is specified by the 
startseq field 


endbuf_ 


u_char* 


Pointer to the byte whose sequence number is specified by the 
endseq field 


flags_ 


uint32 


Flags field for the segment (reserved as of the EA2 release) 



The ReassemblyQueue Class 

The ReassemblyQueue class is a container class used in reconstructing TCP streams 
from TCP segments that have been "snooped" on a TCP connection. This class contains a list of 
TCPSeglnfo objects, each of which corresponds to a single TCP segment. The purpose of this 
class is not only to contain the segments, but to reassemble received segments as they arrive and 
present them in proper sequence number order for applications to read. Applications are 
generally able to read data on the connection in order, or to skip past some fixed amount of 
enqued data. 
Constructor 
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Description 

A ReassemblyQueue object is used internally by the TCP stream reconstruction 
facility, but may be useful to applications in generaly under some circumstances. It provides for 
reassembly of TCP streams based on sequence numbers contained in TCP segments. The 
constructor takes an argument specifying the next sequence number to expect. It is updated as 
additional segments are inserted into the object. If a segment is inserted which is not contiguous 
in sequence number space, it is considered "out of order" and is queued in the object until the 
"hole" (data between it and the previous in-sequence data) is filled. 
Syntax 

ReassemblyQueue (TCPSeq& rcvnxt) 
Parameters 



Parameter 


Type 


Description 


rcvnxt 


TCPSeq& 


A reference to the next TCP sequence number to expect. The sequence 
number referred to by rcvnxt is updated by the add function (see 
below) to always indicate the next in-order TCP sequence number 
expected 



Return value 

None . 

Defines 

The following definitions are provided for insertion of TCP segments into a 
ReassemblyQueue object, and are used as return values for the add function defined below. 
Generally, acceptable conditions are indicated by bits in the low-order half-word, and suspicious 
or error conditions are indicated in the upper half-word. 
jPefine jvalue [Description 
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RQ_OK 


0x00000000 


Segment was non-overlapping and in-order 


RQ_OUTORDER 


0x00000001 


Segment was out of order (didn't match next 
expected sequence number) 


RQ_LOW_OLAP 


0x00000002 


Segment's sequence number was below next 
expected but segment extended past next expected 


RQHIGHOLAP 


0x00000004 


Segment's data overlapped another queued 
segment's data 


RQ_DUP 


0x00000008 


Completely duplicate segment 


RQBADHLEN 


0x00010000 


Bad header length (e.g. less than 5) 


RQBADRSVD 


0x00020000 


Bad reserved field (reserved bits are non-zero) 


RQFLAGSALERT 


0x00040000 


Suspicious combination of flags (e.g. RST on or all 
on, etc) 


RQFLAGSBADURP 


0x00080000 


Bad urgent pointer 



add 

Description 

The add function inserts an IP datagram or complete IP fragment containing a TCP 
segment into the reassembly queue. The TCP sequence number referenced by r cvnxt in the 
constructor is updated to reflect the next in-sequence sequence number expected. 
Syntax 

int add ( IP4Datagram* dp, TCPSeq seq, uint32 dlen); 
int add ( IP4Fragment* fp, TCPSeq seq, uint32 dlen); 



Parameters 



Parameter 


Type 


Description 


fp 


IP4Fragment* 


Pointer to an unfragmented IP fragment containing a TCP 
segment 


dp 


IP4Datagram* 


A pointer to a complete IP datagram containing a TCP segment 


seq 


TCPSeq 


Initial sequence number for the TCP segment 


dlen 


uint32 


Usable length of the TCP segment 



Return value 
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Returns a 32-bit integer with the possible values indicated above (definitions beginning 
with RQ_). 
empty 
Description 

The empty function returns true if the reassembly queue contains no segments. 

Syntax 

bool empty () 
Parameters 
None. 
Return value 

Returns true if the reassembly queue contains no segments, 
clear 

Description 

The clear function removes all queued segments from the reassembly queue and frees 
their storage. 
Syntax 

void clear () 
Parameters 
None. 
Return value 

None. 

read 
Description 
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The read function provides application access to the contiguous data currently queued 
in the reassembly queue. The function returns a linked list of TCPSeglnf o objects. The list is 
in order sorted by sequence number beginning with the first in-order sequence number and 
continues no further than the number of bytes specified by the caller. Note that the caller must 
inspect the value filled in by the call to determine how many byte worth of sequence number 
space is consumed by the linked list. This call removes the segments returned to the caller from 
the reassembly queue. 
Syntax 

TCPSeglnfo* read(int& len); 
Parameters 



Parameter 


Type 


Description 


len 


int& 


Contains the number of bytes worth of in-sequence data the application 
is interested in reading from the reassembly queue. The underlying 
integer is modified by this call to indicate the number of bytes actually 
covered by the list of segments returned. The call is guaranteed to 
never return a larger number of bytes than requested. 



Return value 

Returns a pointer to the first TCPSeglnfo object in a doubly-linked list of objects each 
of which point to TCP segments that are numerically adjacent in TCP sequence number space. 
The TCPEndpoint Class 

The TCPEndpoint class is the abstraction of a single endpoint of a TCP connection. In 
TCP, a connection is identified by a 4-tuple of two IP addresses and a two port numbers. Each 
endpoint is identified by a single IP address and port number. Thus, a TCP connection (or 
"session" — see below) actually comprises two endpoint objects. Each endpoint contains the 
TCP finite state machine state as well as a ReassemblyQueue object, used to contain queued 
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data. The TCPEndpoint class is used internally by the TCPSession class below, but may be 

useful to applications in certain circumstances. 

Constructor 

Description 

The TCPEndpoint class is created in an empty state and is unable to determine which 
endpoint of a connection it represents. The user should call the init function described below 
after object instantiation to begin use of the object. 
Syntax 

TCPEndpoint ( ) 
Parameters 
None. 
Return value 

None . 
Destructor 
Description 

Deletes all queued TCP segments and frees the object's memory. 

Syntax 

-TCPEndpoint () 
Parameters 
None. 
Return value 

None, 

reset 
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Description 

Resets the endpoint internal state to closed and clears any queued data. 

Syntax 

-TCPEndpoint ( ) 
Parameters 
None. 
Return value 

None . 

state 
Description 

Returns the current state in the TCP finite state machine associated with the TCP 
endpoint. 
Syntax 

int state () 
Parameters 
None. 
Return value 

Returns an integer indicating the internal state according to the definitions given above 
(defines beginning with TCPS_) 
init 
Description 

The init function provides initialization of a TCP endpoint object by specifying the IP 
address and port number the endpoint is acting as. After this call has been made, subsequent 
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processing of IP datagrams and fragments containing TCP segments (and ACKs) is 

accomplished by the process calls described below. 

Syntax 

void init (IP4Addr* myaddr, uintl6 myport); 
Parameters 



Parameter 


Type 


Description 


myaddr 


IP4Addr* 


A pointer to the IP address identifying this TCP endpoint \ 


myport 


nuintl6 


The port number (in network byte order) of port number identifying 
this TCP endpoint 



Return value 
None, 
process 
Description 

The proces s function processes an incoming or outgoing TCP segment relative to the 
TCP endpoint object. The first form of the function operates on a datagram which must be 
complete; the second form operates on a fragment which must also be complete. Given that the 
TCPEndpoint object is not actually the literal endpoint of the TCP connection itself, it must 
infer state transitions at the literal endpoints based upon observed traffic. Thus, it must monitor 
both directions of the TCP connection to properly follow the state at each literal endpoint. 
Syntax 

int process (IP4Datagram* pd) ; 
int process ( IP4 Fragment* pf ) ; 
Parameters 

[Parameter [Type {Description ] 
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pd 


IP4Datagram* 


A pointer to a complete IP datagram containing a TCP segment 


pf 


IP4Fragment* 


Pointer to an unfragmented IP fragment containing a TCP segment 



Return value 

Returns a 32-bit integer with the same semantics defined for 
ReassemblyQueue: :add (see above). 
The TCPSession Class 

The TCPSession class is the abstraction of a complete, bi-directional TCP connection. It 
includes two TCP endpoint objects, which each include a reassembly queue. Thus, provided the 
TCPSession object is able to process all data sent on the connection in either direction it will 
have a reasonably complete picture of the progress and data exchanged across the connection. 
Constructor 

Description 

The TCPSession object is created by the caller when a TCP segment arrives on a new 
connection. The session object will infer from the contents of the segment which endpoint will 
be considered the client (the active opener — generally the sender of the first SYN), and which 
will be considered the server (the passive opener — generally the sender of the first S YN+ACK). 
In circumstances of simultaneous active opens (a rare case when both endpoints send SYN 
packets), the notion of client and server is not well defined, but the session object will behave as 
though the sender of the first SYN received by the session object is the client. In any case, the 
terms client and server are only loosely defined and do not affect the proper operation of the 
object. 
Syntax 

TCPSession ( IP4Datagram* dp); 
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TCPSession ( IP4Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


pd 


IP4Datagram* 


A pointer to a complete IP datagram containing the first TCP 
segment on the connection 


Pf 


IP4Fragment* 


Pointer to a complete IP fragment containing a the first TCP 
segment on the connection 



Return value 

None . 
Destructor 

Description 

Deletes all TCP segments queued and frees the object's memory. 

Syntax 

-TCPSession ( ) 

Parameters 

None. 

Return value 

None . 

process 

Description 

The process function processes a TCP segment on the connection. The first form of 
the function operates on a datagram which must be complete; the second form operates on a 
fragment which must also be complete. This function operates by passing the datagram or 
fragment to each endpoint's process function. 
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Syntax 

int process ( IP4Datagram* pd) ; 
int process ( I P4 Fragment* pf ) ; 
Parameters 



Parameter 


Type 


Description 


pd 


IP4Datagram* 


A pointer to a complete IP datagram containing a TCP segment 


Pf 


IP4Fragment* 


Pointer to an unfragmented IP fragment containing a TCP 
segment 



Return value 

Returns a 32-bit integer with the same semantics defined for 
ReassemblyQueue : : add (see above). The value returned will be the result of calling the 
add function of the reassembly queue object embedded in the endpoint object corresponding to 
the destination address and port of the received segment. 

16. Network Address Translation (NAT) 

Network Address Translation (NAT) refers to the general ability to modify various fields 
of different protocols so that the effective source, destination, or source and destination entities 
are replaced by an alternative. The definitions to perform NAT for the IP, UDP, and TCP 
protocols are defined within the ASL. The NAT implementation uses incremental checksum 
computation, so performance should not degrade in proportion to packet size. 

17. IP NAT 

IP address translation refers to the mapping of an IP datagram (fragment) with source and 
destination IP address (sl,dl) to the same datagram (fragment) with new address pair (s2, d2). A 
source-rewrite only modifies the source address (dl is left equal to d2). A destination rewrite 
implies only the destination address is rewritten (si is left equal to s2). A source and destination 
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rewrite refers to a change in both the source and destination IP addresses. Note that for IP NAT, 
only the IP source and/or destination addresses are rewritten (in addition to rewriting the IP 
header checksum). For traffic such as TCP or UDP, NAT functionality must include 
modification of the TCP or UDP pseudoheader checksum (which covers the IP header source 
and destination addresses plus protocol field). Properly performing NAT on TCP or UDP traffic, 
requires attention to these details. 
18. IP NAT Base Class 

The class IPNat provides a base class for other IP NAT classes. Because of the pure 
virtual function rewrite, applications will not create objects of type IP4Nat directly, but 
rather use the objects of typeIP4SNat, IP4DNat,and IP4SDNat defined below, 
rewrite 
Description 

This pure- virtual function is defined in derived classes. It performs address rewriting in a 
specific fashion implemented by the specific derived classes (i.e. source, destination, or 
source/destination combination). The rewrite call, as applied to a fragment, only affects the 
given fragment. When applied to a datagram, each of the fragment headers comprising the 
datagram are re-written. 
Syntax 

virtual void rewrite ( IP4Datagram*fp) = 0; 
virtual void rewrite ( IP4Fragment*fp) = 0; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to rewrite 


fp 


IP4Fragment * 


Pointer to the single fragment to rewrite j 
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Return value 

None. 

There are three classes available for implementing IP NAT, all of which are derived from 
the base class IP4Nat. The classes IP4SNat, IPDNat, and IPSDNat define the structure of 
objects implementing source, destination, and source/destination rewriting for IP datagrams and 
fragments. 

19. IP4SNat class 

The IP4SNat class is derived from the IP4Nat class. It defines the class of objects 
implementing source rewriting for IP datagrams and fragments. 
Constructor 

Description 

Instantiates the IP4SNat object. 
Syntax 

IP4SNat (IP4Addr* newsrc) ; 
Parameters 



Parameter 


Type 


Description 


newsrc 


IP4Addr * 


Pointer to the new source address for IP NAT. 



Return value 

None. 

rewrite 
Description 

Defines the pure virtual rewrite functions in the parent class. 
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Syntax 

void rewrite ( IP4Datagram* dp); 
void rewrite ( I P4 Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment header 
is modified) 



Return value 

None. 

20. IP4DNat class 

The IP4DNat class is derived from the IP4Nat class. It defines the class of objects 
implementing destination rewriting for IP datagrams and fragments. 
Constructor 

Description 

Instantiates the IP4DNat object. 
Syntax 

IP4DNat (IP4Addr* newdst); 
Parameters 



Parameter 


Type 


Description 


newdst 


IP4Addr * 


Pointer to the new destination address for IP NAT. ; 



Return value 

None . 
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rewrite 
Description 

Defines the pure virtual rewrite functions in the parent class. 

Syntax 

void rewrite ( IP4Datagram* dp) ; 
void rewrite ( IP4Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) j 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment j 
header is modified) j 



Return value 

None. 

21. IP4SDNat class 

The IP4SDNat class is derived from the IP4Nat class. It defines the class of objects 
implementing source and destination rewriting for IP datagrams and fragments. 
Constructor 

Description 

Instantiates the IP4SDNat object. 
Syntax 

IP4SDNat (IP4Addr* newsrc, IP4Addr* newdst); 
Parameters 

[Parameter [Type (Description 
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nesrc 


IP4Addr* 


Pointer to the new source address for IP NAT. ] 


newdst 


IP4Addr * 


Pointer to the new destination address for IP NAT. j 



Return value 

None . 
rewrite 

Description 

Defines the pure virtual rewrite functions in the parent class. 

Syntax 

void rewrite ( IP4Datagram* dp); 
void rewrite ( I P4 Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment header 
is modified) 



Return value 

None. 
Example 

For fragments, only the single fragment is modified. For datagrams, all comprising 
fragments are updated. The following simple example illustrates the use of one of these objects: 

Assuming ipal is an address we wish to place in the IP packet's destination address 
field, buf points to the ASL buffer containing an IP packet we wish to rewrite, and iph points 
the IP header of the packet contained in the buffer: 
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IPDNat *ipd = new IPDNat ( &ipal ) ; // create IP DNat object 
IP4Fragment ipf(buf, iph) ; // create IP fragment object 

ipd->rewrite (&ipf ) ; // rewrite fragment's header 

The use of other IP NAT objects follows a similar pattern. 

22. UDP NAT 

The organization of the UDP NAT classes follows the IP NAT classes very closely. The 
primary difference is in the handling of UDP ports. For UDP NAT, the optional rewriting of port 
numbers (in addition to IP layer addresses) is specified in the constructor. 

23. UDPNat base class 

The class UDPNat provides a base class for other UDP NAT classes. The constructor is 
given a value indicating whether port number rewriting is enabled. Because of the pure virtual 
function rewrite, applications will not create objects of type UDPNat directly, but rather use 
the objects of type UDPSNat, UDPDNat, and UDPSDNat defined below. 
Constructor 
Description 

The constructor is given a value indicating whether port number rewriting is enabled. 

Syntax 

UDPNat (bool doports); 
Parameters 



Parameter 


Type 


Description 


doports 


bool 


Boolean value indicating whether the port number rewriting is enabled. 
A true value indicates port number rewriting is enabled. 



Return value 
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None . 
rewrite 

Description 

This pure-virtual function is defined in derived classes. It performs address rewriting in a 
specific fashion implemented by the specific derived classes (i.e. source, destination, or 
source/destination combination). The rewrite call, as applied to a fragment, only affects the 
given fragment. When applied to a datagram, each of the fragment headers comprising the 
datagram are re-written. 
Syntax 

virtual void rewrite ( IP4Datagram*fp) = 0; 
virtual void rewrite (IP4 Fragment *fp) = 0; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to rewrite 


fp 


IP4Fragment * 


Pointer to the single fragment to rewrite 



Return value 
None, 
ports 
Description 

The first form of this function returns true if the NAT object is configured to rewrite port 
numbers. The second form of this function configures the object to enable or disable port 
number rewriting using the values true and false, respectively. 
Syntax 
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bool ports ( ) ; 

void ports (bool' p) ; 

Parameters 



Parameter 


Type 


Description 




bool 


Boolean containing whether port rewriting is enabled. 



Return value 

The first form of this function returns true if the NAT object is configured to rewrite UDP 
port numbers. 

24. UDPSNat class 

The UDPSNat class is derived from the UDPNat class. It defines the class of objects 
implementing source address and (optionally) port number rewriting for complete and 
fragmented UDP datagrams. 
Constructors 
Description 

The single-argument constructor is used to create UDP NAT objects that rewrite only the 
addresses in the IP header (and update the IP header checksum and UDP pseudo-header 
checksum appropriately). The two-argument constructor is used to create NAT objects that also 
rewrite the source port number in the UDP header. For fragmented UDP datagrams, the port 
numbers will generally be present in only the first fragment. 
Syntax 

UDPSNat (IP4Addr* newsaddr, nuintl6 newsport) ; 

UDPSNat (IP4Addr* newsaddr); 

Parameters 
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4 



Parameter 


Type 


Description 


newsaddr 


IP4Addr * 


Pointer the new source address to be used 


newsport 


nuintl6 


The new source port number to be used 



Return value 

None. 

rewrite 
Description 



Defines the pure virtual rewrite functions in the parent class. 

Syntax 

void rewrite ( IP4Datagram* dp); 
void rewrite ( I P4 Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


*P 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment header 
is modified). Should only be called when the fragment represents a 
complete UDP/IP datagram. 



Return value 

None. 

25. UDPDNat class 

The UDPDNat class is derived from the UDPNat class. It defines the class of objects 
implementing destination address and (optionally) port number rewriting for complete and 
fragmented UDP datagrams. 
Constructors 
Description 
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The single-argument constructor is used to create UDP NAT objects that rewrite only the 
addresses in the IP header (and update the IP header checksum and UDP pseudo-header 
checksum appropriately). The two-argument constructor is used to create NAT objects that also 
rewrite the destination port number in the UDP header. For fragmented UDP datagrams, the 
port numbers will generally be present in only the first fragment. 
Syntax 

UDPSNat (IP4Addr* newdaddr, nuintl6 newdport); 

UDPSNat (IP4Addr* newdaddr); 

Parameters 



Parameter 


Type 


Description 


newdaddr 


IP4Addr * 


Pointer the new destination address to be used 


newdport 


nuint!6 


The new destination port number to be used 



Return value 

None. 

rewrite 
Description 

Defines the pure virtual rewrite functions in the parent class. 

Syntax 

void rewrite ( IP4Datagram* dp); 
void rewrite ( I P4 Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment 
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header is modified). Should only be called when the fragment 
represents a complete UDP/IP datagram. ^^^^^^^^ 

Return value 

None. 

26. UDPSDNat class 

The UDPSDNat class is derived from the UDPNat class. It defines the class of objects 
implementing source and destination address and (optionally) port number rewriting for 
complete and fragmented UDP datagrams. 
Constructors 
Description 

The two-argument constructor is used to create UDP NAT objects that rewrite only the 
addresses in the IP header (and update the IP header checksum and UDP pseudo-header 
checksum appropriately). The four-argument constructor is used to create NAT objects that also 
rewrite the source and destination port number in the UDP header. For fragmented UDP 
datagrams, the port numbers will generally be present in only the first fragment. 
Syntax 

UDPSNat (IP4Addr* newsaddr, nuintl6 newsport, IP4Addr* newdaddr, 
nuintl6 newdport) ; 

UDPSNat (IP4Addr* newsaddr, IP4Addr* newdaddr); 
Parameters 



Parameter 


Type 


Description 


newsaddr 


IP4Addr * 


Pointer the new source address to be used I 


newsport 


nuintl6 


The new source port number to be used 


newdaddr 


IP4Addr * 


Pointer the new destination address to be used 


newdport 


nuintl6 


The new destination port number to be used 
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Return value 

None. 

rewrite 
Description 

Defines the pure virtual rewrite functions in the parent class. 

Syntax 

void rewrite ( I P4 Datagram* dp); 
void rewrite ( I P4 Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment header 
is modified). Should only be called when the fragment represents a 
complete UDP/IP datagram. 



Return value 

None. 

27. TCP NAT 

The structure of the TCP NAT support classes follow the UDP classes very closely. The 
primary difference is in the handling of TCP sequence and ACK numbers. 

28. TCPNat base class 

The class TCPNat provides a base class for other TCP NAT classes. The constructor is 
given a pair of values indicating whether port number, sequence number, and acknowledgement 
number rewriting is enabled. Sequence number and ACK number rewriting are coupled such that 
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enabling sequence number rewriting for source-rewriting will modify the sequence number field 
of the TCP segment, but enabling sequence number rewriting for destination-rewriting will 
instead modify the ACK field. This arrangement makes it possible to perform NAT on TCP 
streams without unnecessary complexity in the TCP NAT interface. Because of the pure virtual 
function rewrite, applications will not create objects of type TCPNat directly, but rather use 
the objects of type TCPSNat, TCPDNat, and TCPSDNat defined below. 
Constructor 
Description 

The constructor is given a value indicating whether port number rewriting is enabled. 

Syntax 

TCPNat (bool doports, bool doseqs); 
Parameters 



Parameter 


Type 


Description 


doports 


bool 


Boolean value indicating whether the port number rewriting is enabled. 
A true value indicates port number rewriting is enabled. 


doseqs 


bool 


Boolean value indicating whether the sequence/ACK number rewriting 
is enabled. A true value indicates sequence/ACK number rewriting is 
enabled. 



Return value 
None . 
rewrite 
Description 

This pure-virtual function is defined in derived classes. It performs address rewriting in a 
specific fashion implemented by the specific derived classes (i.e. source, destination, or 
source/destination combination). The rewrite call, as applied to a fragment, only affects the 
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given fragment. When applied to a datagram, each of the fragment headers comprising the 

datagram are re-written. 

Syntax 

virtual void rewrite ( IP4Datagram* dp) = 0; 
virtual void rewrite ( IP4Fragment* fp) = 0; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to rewrite 


fp 


IP4Fragment * 


Pointer to the single fragment to rewrite 



Return value 
None, 
ports 
Description 

The first form of this function returns true if the NAT object is configured to rewrite port 
numbers. The second form of this function configures the object to enable or disable port 
number rewriting using the values true and false, respectively. 
Syntax 

bool ports ( ) ; 

void ports (bool p) ; 

Parameters 



Parameter 


Type 


Description 


P 


bool 


Boolean indicating whether port number rewriting is enabled. 
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Return value 

The first form of this function returns true if the NAT object is configured to rewrite TCP 
port numbers, 
seqs 
Description 

The first form of this function returns true if the NAT object is configured to rewrite 
sequence/ACK numbers. The second form of this function configures the object to enable or 
disable sequence/ACK number rewriting using the values true and false, respectively. 
Syntax 

bool seqs ( ) ; 

void seqs (bool s) ; 

Parameters 



Parameter 


Type 


Description 


s 


bool 


Boolean indicating whether sequence/ACK number rewriting is j 
enabled. 



Return value 

The first form of this function returns true if the NAT object is configured to rewrite TCP 
port-numbers. 

29. TCPSNat class 

The TGPSNat class is derived from the TCPNat class. It defines the class of objects 
implementing source address and (optionally) port number and sequence number rewriting for 
complete and fragmented TCP segments. 
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Constructors 
Description 

The single-argument constructor is used to create TCP NAT objects that rewrite only the 
addresses in the IP header (and update the IP header checksum and TCP pseudo-header 
checksum appropriately). The two-argument constructor is used to create NAT objects that also 
rewrite the source port number in the TCP header. The three-argument constructor is used to 
rewrite the IP address, source port number, and to modify the TCP sequence number by a 
relative (constant) amount. The sequence offset provided may be positive or negative. 
Syntax 

TCPSNat (IP4Addr* newsaddr) ; 

TCPSNat (IP4Addr* newsaddr, nuintl6 newsport); 

TCPSNat (IP4Addr* newsaddr, nuintl6 newsport, long seqoff) 

Parameters 



Parameter 


Type 


Description 


newsaddr 


IP4Addr * 


Pointer the new source address to be used 


newsport 


nuintl6 


The new source port number to be used 


seqoff 


long 


Relative change to make to TCP sequence number fields. A positive 
value indicates the TCP sequence number is increased by the amount 
specified. A negative value indicates the sequence number is reduced 
by the amount specified. 



Return value 

None. 

rewrite 
Description 

Defines the pure virtual rewrite functions in the parent class. 
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Syntax 

void rewrite (IP4Datagram* dp); 
void rewrite ( I P4 Fragment* fp) ; 
Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment header 
is modified). Should only be called when the fragment represents 
a complete TCP/IP segment. 



Return value 

None. 

30. TCPSDNat class 

The TCPSDNat class is derived from the TCPNat class. It defines the class of objects 
implementing source address and (optionally) port number and sequence number/ACK number 
rewriting for complete and fragmented TCP segments. 
Constructors 

Description 

The two-argument constructor is used to create TCP NAT objects that rewrite only the 
addresses in the IP header (and update the IP header checksum and TCP pseudo-header 
checksum appropriately). The four-argument constructor is used to create NAT objects that also 
rewrite the source and destination port numbers in the TCP header. The three-argument 
constructor is used to rewrite the IP address, source port number, and to modify the TCP ACK 
number by a relative (constant) amount. The ACK offset provided may be positive or negative. 
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Syntax 

TCPSDNat (IP4Addr* newsaddr, IP4Addr* newdaddr); 

TCPSDNat (IP4Addr* newsaddr, nuintl6 newsport, IP4Addr* newdaddr, 
nuintl6 newdport); 

TCPSDNat (IP4Addr* newsaddr, nuintl6 newsport, long seqoff, 
IP4Addr* newdaddr, nuintl6 newdport, long ackof f ) ; 
Parameters 



Parameter 


Type 


Description 


newsaddr 


IP4Addr * 


The new source address to be used 


newsport 


nuintl6 


The new source port number to be used 


seqoff 


long 


Relative change to make to TCP sequence number fields. A positive 
value indicates the TCP sequence number is increased by the amount 
specified. A negative value indicates the sequence number is reduced 
by the amount specified. 


newdaddr 


IP4Addr * 


The new destination address to be used 


newdport 


nuintl6 


The new destination port number to be used 


ackoff 


long 


Relative change to make to TCP ACK number fields. A positive value 
indicates the TCP ACK number is increased by the amount specified. 
A negative value indicates the ACK number is reduced by the amount 
specified. 



Return value 

None. 

rewrite 
Description 

Defines the pure virtual rewrite functions in the parent class. 

Syntax 

void rewrite ( IP4Datagram* dp) ; 
void rewrite ( I P4 Fragment* fp) ; 
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Parameters 



Parameter 


Type 


Description 


dp 


IP4Datagram * 


Pointer to the datagram to be rewritten (all fragment headers are 
modified) 


fp 


IP4Fragment * 


Pointer to the fragment to rewrite (only the single fragment header 
is modified). Should only be called when the fragment represents a 
complete TCP/IP segment. 



Return value 

None. 



NetBoost Platform 

Figures 1-4 depict a sample platform for use with the compiler described above. 

Figure 1 shows a Network Infrastructure Application, called Application 2, being 
deployed on an Application Processor (AP) 4 running a standard operating system. The policy 
enforcement section of the Application 2, called Wire Speed Policy 3 runs on the Policy Engine 
fPE) 6. The Policy Engine 6 transforms the inbound Packet Stream 8 into the outbound Packet 
Stream 10 per the Wire Speed Policy 3. Communications from the Application Processor 4 to 
the Policy Engine 6, in addition to the Wire Speed Policy 3, consists of control, policy 
modifications and packet data as desired the Application 2. Communication from the Policy 
Engine 3 to the Application Processor 4 consists of status, exception conditions and packet data 
as desired by the Application 2. 

In a preferred embodiment of a Policy Engine (PE) according to the present invention, 
the PE provides a highly programmable platform for classifying network packets and 
implementing policy decisions about those packets at wire speed. Certain embodiments provide 
two Fast Ethernet ports and implement a pipelined dataflow architecture with store-and-forward. 
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Packets are run through a Classification Engine (CE) which executes a programmed series of 
hardware assist operations such as chained field comparisons and generation of checksums and 
hash table pointers, then are handed to a microprocessor ("Policy Processor" or PP) for execution 
of policy decisions such as Pass, Drop, Enqueue/Delay, (de/en)capsulate, and (de/en)crypt based 
on the results from the CE. Some packets which require higher level processing may be sent to 
the host computer system ("Application Processor" or AP). (See Figure 4.) An optional 
cryptographic ("Crypto") Processor is provided for accelerating such functions as encryption and 
key management. 

CE programs can be written directly in binary; however for programmer convenience a 
microassembly language uasm has been developed which allows a microword to be constructed 
by declaring fields and their values in a symbolic form. The set of common microwords for the 
intended use of the CE have also been described in a higher-level CE Assembly Language called 
masm which allows the programmer to describe operations in a register-transfer format and to 
describe concurrent operations without having to worry about the details of microcode control of 
the underlying hardware. Both of these languages can be used by a programmer or can be 
generated automatically from a compiler which translates CE programs from a higher-level 
language such as NetBoost Classification Language (NCL). 

A sample system level block diagram is shown in FIG. 4. 

Figure 4 shows an application processor 302 which contains a host interface 304 to a PCI 
bus 324. Fanout of the PCI bus 324 to a larger number of loads is accomplished with PCI-to-PCI 
Bridge devices 306. 308, 310, and 312; each of those controls an isolated segment on a "child" 
PCI bus 326, 328, 330, and 332 respectively. On three of these isolated segments 326, 328, and 
330 is a number of Policy Engines 322; each Policy Engine 322 connects to two Ethernet ports 
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320 which connects the Policy Engine 322 to a network segment. 

One of the PCI-to-PCI Bridges 312 controls child PCI bus 322, which provides the 
Application Processor 302 with connection to standard I/O devices 314 and optionally to PCI 
expansion slots 316 into which additional PCI devices can be connected. 

In a smaller configuration of the preferred embodiment of the invention the number of 
Policy Engines 322 does not exceed the maximum load allowed on a PCI bus 324; in that case 
the PCI-to-PCI bridges 306, 308, and 310 are eliminated and up to four Policy Engines 322 are 
connected directly to the host PCI bus 324, each connecting also to two Ethernet ports 320. This 
smaller configuration may still have the PCI-to-PCI Bridge 312 present to isolate Local I/O 314 
and expansion slots 316 from the PCI bus 324, or the Bridge 312 may also be eliminated and the 
devices 314 and expansion 316 may also be connected directly to the host PCI bus 324. 

In certain embodiments, the PE utilizes two Fast Ethernet MACs (Media Access 
Controllers) with IEEE 802.3 standard Media Independent Interface ("MID connections to 
external physical media (PHY) devices which attach to Ethernet segments. Each Ethernet MAC 
receives packets into buffers addressed by buffer pointers obtained from a producer-consumer 
ring and then passes the buffer (that is, passes the buffer pointer) to a Classification Engine for 
processing, and from there to the Policy Processor. The "buffer pointer" is a data structure 
comprising the address of a buffer and a software-assigned "tag" field containing other 
information about that buffer. The "buffer pointer" is a fundamental unit of communication 
among the various hardware and software modules comprising a PE. From the PP, there are 
many paths the packet can take, depending on what the application(s) running on the PP decide is 
the proper disposition of that packet. It can be transmitted, sent to Crypto, delayed in memory, 
passed through a Classification Engine again for further processing, or copied from the PE's 
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memory over the PCI bus to the host's memory or to a peer device's memory, using the DMA 
engine. The PP may also gather statistics on that packet into records in a hash table or in general 
memory. A pointer to the buffer containing both the packet and data structures describing that 
packet is passed around among the various modules. 

The PP may choose to drop a packet, to modify the contents of the packet or to forward 
the packet to the AP or to a different network segment over the PCI Bus (e.g. for routing.) The 
AP or PP can create packets of its own for transmission. A 3rd-party NIC (Network Interface 
Card) on the PCIbus can use the PE memory for receiving packets, and the PP and AP can then 
cooperate to feed those packets into the classification stream, effectively providing acceleration 
for packets from arbitrary networks. When doing so, adjacent 2KB buffers can be concatenated 
to provide buffers of any size needed for a particular protocol. 

FIG. 2 illustrates packet flow according to certain embodiments of the present invention. 
Each box represents a process which is applied to a packet buffer and/or the contents of a packet 
buffer. The buffer management process involves buffer allocation 102 and the recovery of 
retired buffers 118. When buffer allocation 102 into an RX Ring occurs, the Policy Processor 
244 enqueues a buffer pointer into the RX Ring and thus allocates the buffer to the receive MAC 
216 or 230, respectively. Upon receiving a packet, the RX MAC controller 220 or 228 uses the 
buffer pointer at the entry in the RX ring structure which is pointed to by MFILL to identify a 
2KB section of memory 260 that it can use to store the newly received packet. This process of 
receiving a packet and placing it into a buffer is represented by physical receive 104 in FIG. 2. 

The RX MAC controller 220 or 228 increments the MFILL pointer modulo ring size to 
signal that the buffer whose pointer is in the RX Ring has been filled with a new packet plus 
receive status. The Ring Translation Unit 264 detects a difference between MFILL and 
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MCCONS and signals to the classification engine 238 or 242, respectively, for RX Ring, that a 
newly received packet is ready for processing. The Classification Engine 238 or 242 applies 
Classification 106 to that packet and creates a description of the packet which is placed in the 
packet buffer software area, then increments MCCONS to indicate that it has completed 
classification 106 of that packet. The Ring Translation Unit 264 detects a difference between 
MCCONS and MPCONS and signals to the Policy Processor 244 that a classified packet is ready 
for action processing 108. 

The Policy Processor 244 obtains the buffer pointer from the ring location by dequeueing 
that pointer from the RX Ring, and executes application-specific action code 108 to determine 
the disposition of the packet. The action code 108 may choose to send the packet to an Ethernet 
Transmit MAC 218 or 234 by enqueueing the buffer pointer on a TX Ring, respectively; the 
packet may or may not have been modified by the action code 108 prior to this. Alternatively the 
action code 108 may choose to send the packet to the attached cryptographic processor (Crypto) 
246 for encryption, decryption, compression, decompression, security key management, parsing 
of IPSEC headers, or other associated functions; this entire bundle of functions is described by 
Crypto 112. Alternatively the action code 108 may choose to copy the packet to a PCI peer 322 
or 3 14 or 3 1 6, or to the host memory 330, both paths being accomplished by the process 1 14 of 
creating a DMA descriptor as shown in Table 3 and then enqueuing the pointer to that descriptor 
into a DMA Ring by writing that pointer to DMA PROD, which triggers the DMA Unit 210 to 
initiate a transfer. Alternatively the action code 118 can choose to temporarily enqueue the 
packet for delay 110 in memory 260 that is managed by the action code 118. Finally, the action 
code 108 can choose to send a packet for further classification 106 on any of the Classification 
Engines 208, 212, 238, or 242, either because the packet has been modified or because there is 
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additional classification which can be run on the packet which the action code 108 can command 
the Classification process 106 to execute via flags in the RX Status Word, through the buffer's 
software area, or by use of tag bits in the 32-bit buffer pointer reserved for that use. 

Packets can arrive at the classification process 106 from additional sources besides 
physical receive 104. Classification 106 may receive a packet from the output of the Crypto 
processing 112, from the Application Processor 302 or from a PCI peer 322 or 3 14 or 3 16, or 
from the application code 108. 

Packets can arrive at the action code 108 from classification 106, from the Application 
Processor 302, from a PCI peer 322 or 314 or 316, from the output of the Crypto processing 112, 
and from a delay queue 110. Additionally the action code 108 can create a packet. The 
disposition options for these packets are the same as those described for the receive path, above. 

The Crypto processing 112 can receive a packet from the Policy Processor 244 as 
described above. The Application Processor 302 or a PCI peer 322 or 3 1 4 or 3 1 6 can also 
enqueue the pointer to a buffer onto a Crypto Ring to schedule that packet for Crypto processing 
112. 

The TX MAC 218 or 234 transmits packets whose buffer pointer have been enqueued on 
the TX Ring, respectively. Those pointers may have been enqueued by the action code 106 
running on the Policy Processor 244, by the Crypto processing 1 12, by the Application Processor 
302, or by a PCI peer 322 or 3 14 or 3 16. When the TX MAC controller 222 or 232 has retired a 
buffer either by successfully transmitting the packet it contains, or abandoning the transmit due 
to transmit termination conditions, it will optionally write back TX status and TX Timestamp if 
programmed to do so, then will increment MTCONS to indicate that this buffer has been retired. 
The Ring Translation Unit 264 detects that there is a difference between MTCONS and 
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MTRECOV and signals to the Policy Processor 244 that the TX Ring has at least one retired 
buffer to recover; this triggers the buffer recovery process 118, which will dequeue the buffer 
pointer from the TX ring and either send the buffer pointer to Buffer Allocation 102 or will add 
the recovered buffer to a software-managed free list for later use by Buffer Allocation 102. 

It is also possible for a device in the PCI expansion slot 3 1 6 to play the role defined for 
the attached Crypto processor 246 performing crypto processing 112 via DMA114 in this flow. 

FIG 3 shows a Policy Engine ASIC block diagram according to certain embodiments of 
the present invention. 

The ASIC 290 contains an interface 206 to an external RISC microprocessor which is 
known as the Policy Processor 244. Internal to the RISC Processor Interface 206 are registers 
for all units in the ASIC 290 to signal status to the RISC Processor 244. 

There is an interface 204 to a host PCI Bus 280 which is used for movement of data into 
and out of the memory 260, and is also used for external access to control registers throughout 
the ASIC 290. The DMA unit 210 is the Policy Engine 322's agent for master activity on the PCI 
bus 280. Transactions by DMA 210 are scheduled through the DMA Ring. The Memory 
Controller 240 receives memory access requests from all agents in the ASIC and translates them 
to transactions sent to the Synchronous DRAM Memory 260. Addresses issued to the Memory 
Controller 240 will be translated by the Ring Translation Unit 264 if address bit 27 is a T, or will 
be used untranslated by the memory controller 240 to access memory 260 if address bit 27 is a 
'0'. Untranslated addresses are also examined by the Mailbox Unit 262 and if the address 
matches the memory address of one of the mailboxes the associated mailbox status bit is set if 
the transaction is a write, or cleared if the transaction is a read. In addition to the dedicated rings 
in the Ring Translation Unit 264 which are described here, the Ring Translation Unit also 
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implements 5 general-purpose communications rings COM[4:01 226 which software can allocate 
as desired. The memory controller 240 also implements an interface to serial PROMs 270 for 
obtaining information about memory configuration, MAC addresses, board manufacturing 
information, Crypto Daughtercard identification and other information. 

The ASIC contains two Fast Ethernet MACs MAC A and MAC B. Each contains a 
receive MAC 216 or 230, respectively, with associated control logic and an interface to the 
memory unit 220 or 228, respectively; and a transmit MAC 218 or 234 respectively with 
associated control logic and an interface to the memory unit 222 or 232, respectively. Also 
associated with each MAC is an RMON counter unit 224 or 236, respectively, which counts 
certain aspects of all packets received and transmitted in support of providing the Ethernet MIB 
as defined in Internet Engineering Task Force (IETF) standard RFC 1213 and related RF Cs. 

There are four Classification Engines 208, 212, 238, and 242 which are 
microprogrammed processors optimized for the predicate analysis associated with packet 
filtering. Packets are scheduled for processing by these engines through the use of the Reclassify 
Rings respectively, plus the RX MAC controllers MAC A 220 and MAC B 228 can schedule 
packets for processing by Classification Engines 238 and 242, respectively, through use of the 
RX Rings, respectively. 

There is Crypto Processor Interface 202 which enables attachment of an encryption 
processor 246. The RISC Processor 244 can issue reads and writes to the Crypto Processor 246 
through this interface, and the Crypto Processor 246 can access SDRAM 260 and control and 
status registers internal to the interface 202 through use of interface 202. 

A Timestamp counter 214 is driven by a stable oscillator 292 and is used by the RX MAC 
logic 220 and 228, the TX MAC logic 222 and 232, the Classification Engines 208, 212, 238, 
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and 242, the Crypto Processor 246. and the Policy Processor 244 to obtain timestamps during 
processing of packets. 

Further details of this system have been described in a patent entitled "Packet Processing 
System including a Policy Engine having a Classification Unit" filed 06/15/1998, now U.S. Pat. 
No. 6 J 57,955. Some details about the NetBoost sample platform have been omitted to confine 
the disclosure to details related to overall sample software environment. 

Those skilled in the art will appreciate variations of the above described embodiments. 
In addition to these embodiments, other variations will be appreciated by those skilled in the art. 
As such, the scope of the invention is not limited to the specified embodiments, but is defined by 
the following claims. 
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Abstract 

Th e pres e nt inv e ntion relat e s to a gen e ral - purpos e programmabl e packet proc e ssing 

platform for acc e l e rating n e twork infrastructur e applications which hav e b e en structur e d so as to 
separat e th e stages of classification and action. N e twork pack e t classification, ex e cution of 
actions upon those pack e ts, manag e m e nt of buffer flow, e ncryption s e rvic e s, and manag e ment of 
Network Int e rfac e Controllers ar e acc e lerated through th e use of a multiplicity of sp e cializ e d 
modul e s. A language interfac e is d e fined for specifying both stateless and stat e ful classification 
of pack e ts and to associat e actions with classification results in order to e fficiently utiliz e th e s e 
specialized modul e s. 

Described herein are techniques to perform reassembly of a Transmission Control 
Protocol (TCP) data stream from pavloads of TCP segments of a bidirectional TCP connection 
between a first TCP end-point operating at a first network device and a second TCP end-point 
operating at a second network device. 
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